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Abstract 


The  focus  of  this  project  is  to  develop  and  demonstrate  the  fusion  methodologies  to 
enhance  the  ability  to  integrate  multi-source  infonnation  and  to  assess  fusion 
performance  for  numerous  applications  such  as  situational  awareness,  surveillance,  and 
tracking.  The  focus  of  the  Year  1  effort  was  on  developing  a  solid  theoretical  foundation 
and  on  developing  autonomous  and  efficient  infonnation  fusion  algorithms  with 
distributed  sensors.  The  focus  of  Year  2  effort  was  to  develop  a  set  of  fusion  performance 
modeling  methodologies  based  on  explicit  links  of  spatial  and  temporal  relationships 
between  target  features  and  sensor  observations.  We  have  accomplished  the  goals. 
Specifically,  we  developed  a  set  of  scalable  fusion  algorithms  and  the  conesponding 
theoretical  performance  analysis  in  a  dynamic  sensor  network  environment.  We  have  also 
developed  a  framework  for  quantifying  the  classification  performance  of  a  set  of  sensors 
with  varying  qualities  based  on  local  confusion  matrix  and  global  confusion  matrix  using 
Bayesian  network  model.  In  addition,  we  have  developed  a  software  tool  based  on 
UnBBayes  open  source  environment  to  test  the  performance  modeling.  The  resulting 
methodology  has  significant  potential  for  applications  in  high  level  fusion  and  situational 
assessment. 

This  report  summarizes  our  research  accomplishments  during  the  performance  period 
from  August  2008  to  August.  2010. 


1.  Introduction 


Distributed  information  and  data  fusion  with  a  suite  of  sensor  systems  provide  core  capabilities  to 
numerous  applications  such  as  situational  awareness,  surveillance,  navigation,  and  tracking.  The 
increasing  capabilities  and  ubiquity  of  computing  technologies  and  data  networks  continue  to 
advance  information  exploitation  abilities  for  combining  data  from  multiple  remote  sources  to 
produce  increased  value.  In  particular,  information  fusion  is  a  key  enabler  under  development  for 
the  United  States  Department  of  Defense  (US  DoD)  network-centric  Command,  Control, 
Communications  and  Intelligence  (C3I)  capabilities. 

To  date,  the  advances  in  multi-sensor  fusion  systems  have  focused  on  algorithms  for  tracking  and 
fusion,  building  design  architectures  and  rules  for  identifying  information  duplication  (i.e.,  rumor 
propagation  or  double  counting),  and  operation,  allocation,  and  self-calibration  of  programmed  or 
ad  hoc  sensor  networks. 

While  researchers  in  the  field  of  sensor  data  fusion  have  advanced  significantly  during  the  last 
decade,  these  algorithms  have  been  limited  for  the  most  part  to  relatively  well-defined  network 
architectures.  A  unified  fusion  technique  for  general  sensor  networks  is  yet  to  be  developed.  The 
goal  of  this  effort  is  develop  a  solid  theoretical  foundation  and  a  set  of  algorithms  that  can  be 
readily  implemented.  In  particular,  we  proposed  to  develop  and  integrate  the  four  components  of 
research  described  below: 

•  A  theoretical  foundation  for  networked  sensor  fusion  with  arbitrary  connectivity  and 
message  delays  as  well  as  random  or  non-synchronous  local  sensing  and  communication 
rates  while  minimizing  the  amount  of  data  exchanged  between  agents. 

•  A  set  of  practical  autonomous  fusion  algorithms  and  a  new  methodology  for  the  design 
and  selection  of  fusion  rules,  communication  architecture,  and  deployment  configuration 
of  distributed  sensor  networks. 

•  A  general  framework  for  quantifying  the  operational  characteristics  of  a  sensor,  a  set  of 
metrics  for  evaluating  the  operational  performance  of  a  sensor  network,  and  evaluating 
the  fusion  performance  of  multiple,  asynchronous  sensors  of  varying  quality. 

•  A  test  and  simulation  environment  to  validate  and  support  design  of  distributed  fusion 
algorithms  and  performance  modeling  and  to  be  available  for  integration  into  government 
systems. 


2.  Project  Tasks 

The  general  research  tasks  for  this  research  as  proposed  are  summarized  thus: 

•  Develop  autonomous  information  fusion  algorithms  with  distributed  sensors.  The  focus 
is  on  developing  a  solid  theoretical  foundation  and  a  set  of  scalable  and  efficient 
algorithms.  We  focus  on  level- 1  and  level-2  fusion  as  defined  by  the  DFIG  model. 

•  Develop  a  set  of  fusion  performance  modeling  methodologies  based  on  explicit  links  of 
spatial  and  temporal  relationships  between  target  features  and  sensor  observations.  The 
focus  is  on  developing  Bayesian  network  modeling  and  inference  algorithms  with  high 
level  situational  assessment. 


•  Develop  test  and  simulation  to  validate  and  support  design  of  distributed  fusion 
algorithms  and  performance  modeling.  A  main  task  is  to  develop  metrics  for  quantifying 
the  performance  of  the  system  then  apply  the  metrics  to  evaluate  and  compare  alternative 
fusion  strategies.  The  emphasis  will  be  on  testing  algorithm  scalability  and  real  time 
capability. 


3.  Project  Schedule  and  Milestones 

The  original  project  schedule  and  milestones  are  given  in  Table  1.  The  original  project 
Work  Plan  Schedule  is  shown  in  Table  2. 


Table  1.  Project  Schedule  and  Milestones 
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1.  Develop  fusion  performance  modeling  methodologies  based  on  explicit  links  of  spatial  and  temporal  relationships  between  target 
features  and  sensor  observations. 

(a)  Develop  Bayesian  network  modeling  for  low  and  high  level  situational  assessment 

(b)  Develop  efficient  BN  inference  algorithms  for  hybrid  models 

(c)  Develop  fusion  performance  modeling  and  evaluation  methodologies  with  a  set  of  defined  performance  metrics 
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2.  Develop  methodology  and  software  prototype  to  validate  and  evaluate  the  performance  of  the  fusion  system 
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(a)  Develop  adaptive  model  resolution  and  metrics  for  quantifying  the  performance  of  the  system 

c 

(b)  Develop  MATL4B  and  Java  based  test  environment  to  validate  performance  methodologies 
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(c)  Initiate  technology  transferto  industry  or  government  as  specified  by  AFOSR. 

v> 

3.  Support  the  technical  exchanges  and  special  studies  as  required  by  the  AFOSR  Program  Manager. 

(Z 

(a)  Attend  and  participate  in  technical  interchange  meetings  to  discuss  technical  issues  related  to  the  research  tasks. 

(b)  Lead  and  participate  in  special  studies  as  required. 

o 
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(c)  Document  and  distribute  the  technical  findings  and  the  simulation  results  as  needed. 

■o' 

4.  Management  and  Reporting. 

a. 

(a)  Prepare  monthly  financial  reports  and  annual  technical  progress  reports. 

(b)  Prepare  semi-annual  progress  review  and  comprehensive  annual  technical  reports. 

Table  2.  Original  Work  Plan  Schedule 
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l.a  Develop  Bayesian  network  modeling  tool  for  situation 
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l.b  Develop  efficient  hybrid  BN  inference  algorithms 

l.c  Develop  fusion  performance  modeling  and  metrics 
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2.ab  Develop  software  prototype  to  evaluate  performance 
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4.  Prepare  project  progress  review  and  technical  reports 
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4.  Change  in  AFOSR  Program  Manager  and  Project  Scope 


The  project  was  started  in  August  2008.  Soon  after  the  project  was  started,  the  program 
manager  was  changed  from  Dr.  David  Luginbuhl  to  Dr.  Doug  Cochran  in  late  2008. 
After  a  brief  discussion  with  the  PI  in  summer  2009,  the  new  program  manager,  Dr.  Doug 
Cochran,  felt  that  the  project  “is  not  sufficiently  aligned  with  the  scientific  direction  the 
AFOSR  Information  Fusion  program  is  moving  to  justify  continued  investment”  and 
decided  to  terminate  the  project  in  July  2009  before  a  formal  annual  review  which  took 
place  in  Oct.  2009.  The  project  option  was  not  exercised  but  a  no-cost  extension  of  the 
effort  to  August  2010  was  subsequently  granted  by  Dr.  Cochran.  The  program  manager 
was  changed  from  Dr.  Doug  Cochran  to  Dr.  Robert  Bonneau  in  2010. 

Because  of  the  50%  funding  reduction  of  the  project  due  to  early  termination,  we  were 
not  able  to  complete  all  the  tasks  as  planned  in  the  original  proposal.  However,  we  did 
try  our  best  to  accomplish  as  much  technical  tasks  as  we  could.  The  details  are  described 
in  Section  6. 

5.  Project  Management 

This  research  project  was  directed  by  Dr.  KC  Chang  of  George  Mason  University,  who 
devoted  20%  of  his  time  during  the  academic  year  and  six  weeks  during  the  summer  to 
this  research.  The  research  effort  was  performed  by  Dr.  KC  Chang  (PI)  together  with  a 
research  faculty  and  two  graduate  students.  Specifically, 

1.  A  part-time  (25%)  research  faculty,  Dr.  Wei  Sun,  who  worked  on  the 
development  of  efficient  hybrid  inference  algorithms  and  the  development  of 
metrics  to  quantify  the  overall  performance  of  the  systems. 

2.  A  part-time  (50%)  PhD  student,  Mr.  Rommel  Carvalho,  who  focused  on  the 
development  of  a  theoretical  foundation  and  analytical  methodology  for 
predicting  fusion  performance  assessments  as  well  as  the  software  development 
for  the  simulation  environment. 

3.  A  full-time  MS  student,  Mr.  Ashirvad  Naik,  who  worked  on  developing  a 
modeling  and  simulation  environment  with  MATLAB  to  support  specification 
and  performance  evaluations,  and  the  validation  of  the  proposed  methodologies 
with  test  cases  and  established  benchmark  problems. 


6.  Technical  Accomplishments 

During  the  two  years  effort  of  the  project,  we  have  been  developing  rigorous 
mathematical  foundation  and  a  set  of  algorithms  for  distributed  fusion  in  dynamic 
networks.  In  particular,  we  have  accomplished  the  following: 

•  A  mathematical  foundation  based  on  information  genealogy  for  networked  sensor 
fusion  with  arbitrary  connectivity  and  message  delays  as  well  as  a  set  of  practical 
autonomous  information  fusion  and  dissemination  algorithms.  We  have 
documented  and  published  several  papers  on  this  area  [1-2,8],  The  papers  were 


well  received.  Specifically,  an  earlier  of  the  paper  [8]  published  in  Fusion  2008 
was  the  runner-up  of  the  best  paper  award  (top  1%  of  the  300+  papers). 

•  Complementary  multi-level  dynamic  Bayesian  network  (DBN)  modeling  and 
inference  algorithms  that  provide  the  infrastructure  to  aggregate  traditional  and 
non-traditional  data  from  disparate  sources  at  each  fusion  level.  In  particular, 
scalable  inference  in  distributed  hybrid  Bayesian  networks  is  an  important  area 
for  research  but  remains  a  difficult  task  because  of  its  potentially  arbitrary 
distributions  and  possible  nonlinear  dependence  relationships  between  variables. 
We  proposed  a  unified  computing  scheme  of  messages  propagating  between 
different  types  of  variables.  We  have  documented  and  published  several  papers 
on  this  area  [4-5,7]. 

•  Performance  analysis  of  a  multisensor  fusion  system  modeled  by  a  Bayesian 
network.  Multi-Sensor  Fusion  is  founded  on  the  principle  that  combining 
information  from  different  sensors  will  enable  a  better  understanding  of  the 
surroundings.  However,  it  would  be  desirable  to  evaluate  how  much  one  gains  by 
combining  different  sensors  in  a  fusion  system,  even  before  implementing  it.  We 
developed  a  state-of-the-art  tool  that  allows  a  user  to  evaluate  the  classification 
performance  of  a  multisensor  fusion  system  modeled  by  a  Bayesian  network. 
Specifically,  the  results  was  documented  in  a  paper  published  in  Fusion  2009  [3] 
which  received  one  of  the  best  student  paper  awards. 

•  Mixture  distribution  representation  and  metrics  for  scalable  fusion  -  Mixture 
distributions  have  been  used  in  many  applications  for  dynamic  state  estimation 
including  distributed  tracking,  and  multisensor  fusion.  However,  the  recursive 
processing  of  the  mixture  distributions  incurs  rapidly  growing  computational 
requirements.  In  order  to  keep  the  computational  complexity  tractable  and  to 
ensure  scalability  while  trading-off  performance,  we  developed  a  recursive 
mixture  reduction  algorithm  with  a  given  error  bound.  We  have  documented  and 
published  our  work  in  [6,9]. 


7.  Technology  Transfer 

We  have  been  working  with  several  small  businesses  to  apply  our  technology  to  other 
applications.  For  example,  we  have  been  working  with  Mr.  Mark  Frymire  and  Dr.  Chris 
Smith  of  Decisive  Analytic  Corporation  to  apply  the  scalable  fusion  technique  we 
developed  in  this  effort  for  missile  defense  application  [9].  We  have  also  worked  with  Dr. 
Craig  Agate  of  Toyon  corporation  on  applying  our  fusion  techniques  for  ad  hoc  UAV 
sensor  networks  [10]. 
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I.  INTRODUCTION 


Message  Passing  for 
Hybrid  Bayesian  Networks: 
Representation,  Propagation, 
and  Integration 

WEI  SUN 

K.  C.  CHANG 

George  Mason  University 


The  traditional  message  passing  algorithm  was  originally 
developed  by  Pearl  in  the  1980s  for  computing  exact  inference 
solutions  for  discrete  polytree  Bayesian  networks.  When  a  loop 
is  present  in  the  network,  propagating  messages  are  not  exact, 
but  the  loopy  algorithm  usually  converges  and  provides  good 
approximate  solutions.  However,  in  general  hybrid  Bayesian 
networks,  the  message  representation  and  manipulation  for 
arbitrary  continuous  variable  and  message  propagation  between 
different  types  of  variables  are  still  open  problems.  The  novelty  of 
the  work  presented  here  is  to  propose  a  framework  to  compute, 
propagate,  and  integrate  messages  for  hybrid  models.  First,  we 
combine  unscented  transformation  and  Pearl’s  message  passing 
algorithm  to  deal  with  the  arbitrary  functional  relationships 
between  continuous  variables  in  the  network.  For  the  general 
hybrid  model,  we  partition  the  network  into  separate  network 
segments  by  introducing  the  concept  of  interface  node.  We 
then  apply  different  algorithms  for  each  subnetwork.  Finally 
we  integrate  the  information  through  the  channel  of  interface 
nodes  and  then  estimate  the  posterior  distributions  for  all  hidden 
variables.  The  numerical  experiments  show  that  the  algorithm 
works  well  for  nonlinear  hybrid  BNs. 
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Bayesian  network  (BN),  also  known  as  probability 
belief  network,  causal  network,  [7,  23,  24]  is  a 
graphical  model  for  knowledge  representation  under 
uncertainty  and  a  popular  tool  for  probabilistic 
inference.  It  models  dependence  relationships  between 
random  variables  involved  in  the  problem  domain  by 
conditional  probability  distributions  (CPDs).  In  the 
network,  CPD  is  encoded  in  the  directed  arc  linking 
the  associated  random  variables.  The  random  variables 
that  have  arcs  pointing  to  other  random  variables 
are  called  parent  nodes  and  the  random  variables 
that  have  incoming  arcs  are  called  children  nodes. 

The  most  important  property  of  the  BN  is  that  it 
fully  specifies  the  joint  distribution  over  all  random 
variables  by  a  product  of  all  CPDs.  This  is  because 
each  random  variable  is  conditional  independent 
of  its  nondescendant  given  its  parents.  Factoring 
reduces  the  numbers  of  parameters  representing 
the  joint  distribution  and  so  saves  the  computations 
for  reasoning.  One  of  the  important  tasks  after 
constructing  the  BN  model  is  to  conduct  probabilistic 
inference.  However,  this  task  is  NP-hard  in  general 
[8].  This  is  true  even  for  the  seemingly  easier  task 
of  finding  approximate  solutions  [10],  Nevertheless, 
for  some  special  classes  such  as  discrete  polytree 
or  linear  Gaussian  polytree  networks,  there  exists 
an  exact  inference  algorithm  using  message  passing 
[24]  that  could  be  done  in  linear  time.  In  the  past 
decades,  researchers  have  proposed  a  great  number  of 
inference  algorithms  for  various  BNs  in  the  literature 
[12],  They  can  be  divided  into  two  basic  groups:  exact 
and  approximate  algorithms.  Exact  inference  only 
works  for  very  limited  types  of  networks  with  special 
structure  and  CPDs  in  the  model.  For  example,  the 
most  popular  exact  inference  algorithm — Clique  tree 
[20,  28],  also  known  as  junction  tree  or  clustering 
algorithm  [13] — only  works  for  a  discrete  network 
or  the  simplest  hybrid  model  called  conditional  linear 
Gaussian  (CLG)  [18].  In  general,  the  complexity 
of  the  exact  inference  is  exponential  to  the  size  of 
the  largest  clique1  of  the  triangulated  moral  graph 
in  the  network.  For  networks  with  many  loops  or 
general  hybrid  models  that  have  mixed  continuous  and 
discrete  variables,  the  intractability  rules  out  the  use  of 
the  exact  inference  algorithms. 

For  probabilistic  inference  with  hybrid  models, 
relatively  little  has  been  developed  so  far.  The  simplest 
hybrid  model  CLG  is  the  only  hybrid  model  for  which 
exact  inference  could  be  done.  The  state-of-the-art 
algorithm  for  exact  inference  in  CLG  is  Lauritzen’s 
algorithm  [17,  19].  It  computes  the  exact  answers  in 
the  sense  that  the  first  two  moments  of  the  posterior 
distributions  are  correct,  while  the  true  distribution 
might  be  a  mixture  of  Gaussians.  In  general,  the 
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hybrid  model  may  involve  arbitrary  distributions  and 
arbitrary  functional  relationships  between  continuous 
variables.  It  is  well  known  that  no  exact  inference  is 
possible  in  this  case.  However,  approximate  methods 
have  been  proposed  [6,  16]  to  handle  different  hybrid 
models.  In  recent  years,  researchers  also  proposed 
inference  algorithms  using  mixture  of  truncated 
exponentials  (MTE)  [9,  21]  to  approximate  arbitrary 
distributions  in  order  to  derive  the  close-form  solution 
for  inference  in  hybrid  models. 

Generally,  there  are  three  main  categories  of 
approximate  inference  methods  for  BNs:  model 
simplification,  stochastic  sampling,  and  loopy 
belief  propagation.  Model  simplification  methods 
simplify  the  model  to  make  the  inference  algorithm 
applicable.  Some  commonly  applied  simplification 
methods  include  the  removal  of  weak  dependency, 
discretization,  and  linearization.  Stochastic  sampling 
is  a  popular  framework  including  a  number  of 
algorithms,  such  as  likelihood  weighting  (LW) 

[11,  27]  and  the  state-of-the-art  importance  sampling 
algorithm  called  adaptive  importance  sampling 
(AIS-BN)  for  discrete  BNs  [5],  The  major  issue 
for  sampling  methods  is  to  find  a  good  sampling 
distribution.  The  sampling  algorithm  could  be  very 
slow  to  converge  or  in  some  cases  with  unlikely 
evidence,  it  may  not  converge  even  with  a  huge 
sample  size.  In  recent  years,  applying  Pearl’s 
message  passing  algorithm  to  the  network  with 
loops,  so-called  “loopy  belief  propagation”  (LBP) 

[22,  29],  has  become  very  popular  in  the  literature. 
Although  the  propagating  messages  are  not  exact, 
researchers  found  that  LBP  usually  converges,  and 
when  it  converges  it  provides  good  approximate 
results.  Due  to  its  simplicity  of  implementation 
and  good  empirical  performance,  we  propose  to 
extend  LBP  for  approximate  inference  for  hybrid 
model.  Unfortunately,  because  of  the  differences  in 
representation  and  manipulations  of  messages  with 
discrete  and  continuous  variables,  there  is  no  simple 
and  efficient  way  to  pass  messages  between  them.  In 
[30],  the  authors  use  general  nonparametric  form  to 
represent  messages  and  formulate  their  calculation  by 
numerical  integrations  for  hybrid  models.  The  method 
requires  extensive  functional  estimations,  samplings, 
and  numerical  integrations,  and  therefore  is  very 
computational  intensive. 

Under  the  framework  of  a  message  passing 
algorithm,  first  of  all,  we  need  to  find  a  general  way 
to  represent  messages.  Essentially,  messages  are 
likelihoods  or  probabilities.  In  discrete  case,  messages 
are  represented  and  manipulated  by  probability 
vectors  and  conditional  probability  tables  (CPTs) 
which  is  relatively  straightforward.  Lor  continuous 
variables,  however,  it  is  more  complicated  for 
message  representation  and  manipulation  as  they 
may  have  arbitrary  distributions.  In  this  paper,  we 
propose  to  use  the  first  two  moments,  mean  and 
variance,  of  a  probability  distribution  to  represent 


the  continuous  message  regardless  of  its  distribution. 
This  simplification  makes  message  calculation  and 
propagation  efficient  between  continuous  variables 
while  keeping  the  key  information  of  the  original 
distributions.  Lurthermore,  to  deal  with  the  possible 
arbitrary  functional  relationship  between  continuous 
variables,  a  state  estimation  method  is  needed  to 
approximate  the  distribution  of  a  random  variable 
that  has  gone  through  nonlinear  transformation. 

Several  weighted  sampling  algorithms  such  as 
particle  filtering  [1]  and  Bayesian  bootstrapping 
[2]  for  nonlinear  state  estimation  were  proposed  in 
the  literature.  However,  we  prefer  to  use  unscented 
transformation  [14,  15]  due  to  its  computational 
efficiency  and  accuracy.  Unscented  transformation 
uses  a  deterministic  sampling  scheme  and  can 
provide  good  estimates  of  the  first  two  moments 
for  the  continuous  variable  undergone  nonlinear 
transformation.  Lor  arbitrary  continuous  network, 
this  approach  we  called  unscented  message  passing 
(UMP)  works  very  well  [25],  But  in  the  hybrid 
model,  message  propagation  between  discrete  and 
continuous  variables  is  not  straightforward  due  to  their 
different  formats.  To  deal  with  this  issue,  we  propose 
to  apply  conditioning.  Lirst  we  partition  the  original 
hybrid  BNs  into  separate,  discrete,  and  continuous 
network  segments  by  conditioning  on  discrete  parents 
of  continuous  variables  [26].  We  can  then  process 
message  passing  separately  for  each  network  segment 
before  final  integration. 

One  of  the  benefits  of  partitioning  networks  is 
to  ensure  that  there  is  at  least  one  efficient  inference 
method  applicable  to  each  network  segment.  In  hybrid 
networks,  we  assume  that  a  continuous  node  is  not 
allowed  to  have  any  discrete  child  node.  Therefore, 
the  original  networks  can  be  partitioned  into  separate 
parts  by  the  discrete  parents  of  continuous  variables. 
We  call  these  nodes  the  interface  nodes.  Each 
network  segment  separated  by  the  interface  nodes 
consists  of  purely  discrete  or  continuous  variables. 

By  conditioning  on  interface  nodes,  the  variables  in 
different  network  segments  are  independent  of  each 
other.  We  then  conduct  loopy  propagation  separately 
in  each  subnetwork.  Linally,  messages  computed  in 
different  segments  are  integrated  through  the  interface 
nodes.  We  then  estimate  the  posterior  distribution  of 
every  hidden  variable  given  evidence  in  all  network 
segments. 

The  algorithm  proposed  in  this  paper  aims  to 
tackle  nonlinear  hybrid  models.  We  believe  that  the 
proposed  combination  of  known  efficient  methods 
and  the  introduction  of  interface  nodes  for  hybrid 
network  partition  makes  the  new  algorithm  a  good 
alternative  for  inference  in  nonlinear  hybrid  models. 
The  remainder  of  this  paper  is  organized  as  follows. 
Section  II  first  reviews  Pearl’s  message  passing 
formulae.  We  then  discuss  the  message  representation 
and  manipulation  for  continuous  variable  and  how 
to  propagate  messages  between  continuous  variables 
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with  nonlinear  functional  relationship.  Section  III 
describes  the  methods  of  network  partition  and 
message  integration  by  introducing  the  concept  of 
interface  nodes.  We  show  how  message  passing  can 
be  done  separately  and  finally  integrated  together 
via  the  channel  of  interface  nodes.  Section  IV 
presents  the  algorithm  of  hybrid  message  passing 
by  conditioning.  Several  numerical  experiments  are 
presented  in  Section  V.  Finally,  Section  VI  concludes 
the  research  we  have  done  in  this  paper  and  suggests 
some  potential  future  work. 


II.  MESSAGE  PASSING:  REPRESENTATION  AND 
PROPAGATION 


Pearl’s  message  passing  algorithm  [24]  is  the  first 
exact  inference  algorithm  developed  originally  for 
polytree  discrete  BNs.  Applying  Pearl’s  algorithm  in 
the  network  with  loops  usually  provides  approximate 
answers,  and  this  method  is  called  LBP  Recall  that 
in  Pearl’s  message  passing  algorithm,  ej  and  ex 
are  defined  as  the  evidence  from  the  subnetwork 
“above”  a  node  X  and  the  subnetwork  “below”  X, 
respectively.  In  a  polytree,  any  node  X  d-separates 
the  set  of  evidence  e  into  {ex,ex}.  In  the  algorithm, 
each  node  in  the  network  maintains  two  values  called 
A  value  and  n  value.  A  value  of  a  node  X,  defined  as 


A(X)  =  P(ex  \X)  (1) 

is  the  likelihood  of  observations  e  Y  given  X.  7 r  value 
of  a  node  X,  defined  as 

tt(X)  =  P(X  |  e+)  (2) 

is  the  conditional  probability  of  X  given  ej. 

The  belief  of  a  node  X  given  all  evidence  is 
the  normalized  product  of  7 r  value  and  A  value. 

Each  node,  after  updating  its  own  belief,  sends  new 
A  message  to  its  parents  and  new  7 r  message  to 
its  children.  For  a  typical  node  X  with  m  parents 
T(TuT2,...,Tm)  and  n  children  Y(Yl,Y2,...,Yn)  as 
illustrated  in  Fig.  1,  the  conventional  propagation 
equations  of  Pearl's  message  passing  algorithm  can 
be  expressed  as  the  following  [24]: 


BEL(X)  =  a7r(X)A(X) 

n 

mx) = nvx) 


i= 1 


rW  =  Ei>(X|T)n^(JJ) 


(3) 

(4) 

(5) 


i=  1 


w  =  E  w  E  F<x  1 T)  n  *x<?k)  (6) 

X  Tk'.k^i  k^i 


7Ty.(X)  =  a 


IPs® 

k?j 


n(X) 


(7) 


Fig.  1.  Typical  node  X  with  m  parents  and  n  children. 


where  XY  (X)  is  the  A  message  node  X  receives  from 
its  child  Yj,  AX(7J)  is  the  A  message  X  sends  to  its 
parent  7J;  7tx(7 ])  is  the  7r  message  node  X  receives 
from  its  parent  7J,  7ry.(X)  is  the  tt  message  X  sends 
to  its  child  Yj ;  and  a  is  a  normalizing  constant. 

When  this  algorithm  is  applied  to  a  polytree 
network,  the  messages  propagated  are  exact  and  so  are 
the  beliefs  of  all  nodes  after  receiving  all  messages. 
For  the  network  with  loops,  we  can  still  apply  this 
algorithm  as  the  “loopy  propagation”  mentioned 
above.  In  general,  loopy  propagation  will  not  provide 
the  exact  solutions.  But  empirical  investigations  on  its 
performance  have  reported  surprisingly  good  results. 

For  discrete  variables,  messages  could  be 
represented  by  probability  vectors,  and  the  conditional 
probability  table  of  node  X  given  its  parent  T, 

P(X  |  T),  could  be  represented  by  a  matrix.  Therefore 
the  calculations  in  the  above  formulae  are  the 
product  of  vectors  and  multiplication  of  vector  and 
matrices,  which  can  be  carried  out  easily.  However, 
for  continuous  variables,  message  representation 
and  the  corresponding  calculations  are  much  more 
complicated.  First,  an  integral  replaces  summation  in 
the  above  equations.  Furthermore,  since  continuous 
variable  could  have  arbitrary  distribution  over  the 
continuous  space,  in  general  it  is  very  difficult  to 
obtain  exact  close-form  analytical  results  when 
combining  multiple  continuous  distributions.  In  order 
to  make  the  computations  feasible  while  keeping 
the  key  information,  we  use  the  first  two  moments, 
mean  and  variance,  to  represent  continuous  message 
regardless  of  the  original  distribution.  Then,  the 
product  of  different  continuous  distributions  could 
be  approximated  with  a  Gaussian  distribution.  Note 
that  for  the  continuous  case,  P(X  |  T)  is  a  continuous 
conditional  distribution,  and  it  may  involve  an 
arbitrary  function  between  continuous  variables.  To 
integrate  the  product  of  continuous  distributions  as 
shown  in  (5)  and  (6),  it  has  to  take  into  account  the 
functional  transformation  of  continuous  variables. 
Fortunately,  unscented  transformation  [14,  15] 
provides  good  estimates  of  mean  and  variance  for  the 
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continuous  variables  through  nonlinear  transformation. 
In  our  algorithm,  unscented  transformation  plays 
a  key  role  for  computing  continuous  messages. 
Specifically,  we  use  it  to  formulate  and  compute  the 
7r  and  A  messages  since  both  computations  involve  the 
conditional  probability  distribution  in  which  nonlinear 
transformation  may  be  required. 

A.  Unscented  Transformation 


and  the  associated  weights  for  these  2L  +  1  sigma 
points  are 
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Proposed  in  1996  by  Julier  and  Uhlmann  [15], 
unscented  transformation  is  a  deterministic  sampling 
method  to  estimate  mean  and  variance  of  continuous 
random  variable  that  has  undergone  nonlinear 
transformation.  Consider  the  following  problem: 
a  continuous  random  variable  x  with  mean  x  and 
covariance  matrix  Px  undergoes  an  arbitrary  nonlinear 
transformation,  written  as  y  =  g(x);  the  question  is 
how  to  compute  the  mean  and  covariance  of  y? 

From  probability  theory,  we  have 


P( y  I  \)p(x)dx. 


However,  in  general  the  above  integral  may  be 
difficult  to  compute  analytically  and  may  not  always 
have  a  close-form  solution.  Therefore,  instead  of 
finding  the  distribution,  we  retreat  to  seek  for  its 
mean  and  covariance.  Based  on  the  principle  that  it  is 
easier  to  approximate  a  probability  distribution  than  an 
arbitrary  nonlinear  function,  unscented  transformation 
uses  a  minimal  set  of  deterministically  chosen  sample 
points  called  sigma  points  to  capture  the  true  mean 
and  covariance  of  the  prior  distribution.  Those  sigma 
points  are  propagated  through  the  original  functional 
transformation  individually.  According  to  its  formulae, 
posterior  mean  and  covariance  calculated  from  these 
propagated  sigma  points  are  accurate  to  the  2nd  order 
for  any  nonlinearity.  In  the  special  case  when  the 
transformation  function  is  linear,  the  posterior  mean 
and  variance  are  exact. 

The  original  unscented  transformation  encounters 
difficulties  with  high-dimensional  variables,  so  the 
scaled  unscented  transformation  was  developed 
soon  afterward  [14].  The  scaled  unscented 
transformation  is  a  generalization  of  the  original 
unscented  transformation.  We  will  use  the  two  terms 
interchangeably,  but  both  mean  scaled  unscented 
transformation  in  the  remainder  of  this  paper. 

Now  let  us  describe  the  formulae  of  unscented 
transformation.  Assume  x  is  L-dimensional 
multivariate  random  variable.  First,  a  set  of  2L  +  1 
sigma  points  are  specified  by  the  following  formulae: 


A  =  cc(L  +  k)-L 


X  = 


'*(>=x 

<  X.=i+(y/{L  +  A)PS). 
>X.=i-(y/{L  +  X)Px)i 


i  =  0 

i  =  1,...,L 
i  =  L  +  \,...,2L 


(8) 


where  a,  (3,  n  are  scaling  parameters  and  the 
superscripts  “( m ),”  “(c)”  indicate  the  weights 
for  computing  posterior  mean  and  covariance, 
respectively.  The  values  of  scaling  parameters  could 
be  chosen  by  0  <  a  <  1,  /?  >  0,  and  k  >  0.  It  has  been 
shown  empirically  that  the  specific  values  chosen 
for  the  parameters  are  not  critical  because  unscented 
transformation  is  not  sensitive  to  those  parameters. 

We  choose  a  =  0.8,  (3  =  2  (optimal  for  Gaussian  prior 
[14]),  and  n  =  0  in  all  of  our  experiments. 

After  the  sigma  points  are  selected,  they  are 
propagated  through  the  functional  transformation: 

y,=g(X,)  i  =  0, . . . ,  2L.  (10) 

Finally,  the  posterior  mean  and  covariance  are 
estimated  by  combining  the  propagated  sigma  points 
as  follows: 


2L 


i= 0 
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In  short,  we  denote  the  unscented  transformation 
for  X  undergoing  a  functional  transformation  Y  = 
f(X )  as  the  following: 

(T.mu,  y.cov)  =  UT(x^lYy  (14) 

We  demonstrate  the  unscented  transformation 
by  a  simple  two-dimension  Gaussian  example.  Let 
x  =  [X|  x2  ]  with  mean  and  covariance  matrix  given  as 


"3‘ 

■  i  -r 

X  = 

,  Px  = 

.1. 

.-1  2  . 

In  order  to  show  the  robustness  of  unscented 
transformation,  we  choose  a  set  of  functions  with 
severe  nonlinearity  shown  below: 

A  i  =  log(xf)cos(x2),  y2  =  ^exp(x2)sin(x1x2). 

The  true  posterior  statistics  are  approximated  very 
closely  by  brute  force  Monte  Carlo  simulation 
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Fig.  2.  Demonstration  of  unscented  transformation,  (a)  Prior  distribution,  (b)  After  nonlinear  transformation. 


using  100,000  sample  points  drawn  from  the 
prior  distribution  and  then  propagated  through  the 
nonlinear  mapping.  We  compare  them  with  the 
estimates  calculated  by  unscented  transformation 
using  only  5  sigma  points.  Fig.  2  shows  that  the 
mean  calculated  by  transformed  sigma  points  is 
very  close  to  the  true  mean  and  that  the  posterior 
covariance  seems  consistent  and  efficient  because 
the  sigma-point  covariance  ellipse  is  larger  but 
still  tight  around  the  true  posterior  covariance 
ellipse. 


B.  Unscented  Message  Passing 


Now  let  us  take  a  closer  look  at  Pearl’s  general 
message  propagation  formulae  shown  in  (3)— (7).  In 
recursive  Bayesian  inference,  7 r  message  represents 
prior  information  and  A  message  represents  evidential 
support  in  the  form  of  a  likelihood  function.  Equations 
(3),  (4),  and  (7)  are  essentially  the  combination  of 
different  messages  by  multiplication.  They  are  similar 
to  the  data  fusion  concept  where  estimates  received 
from  multiple  sources  are  combined. 

Under  the  assumption  of  Gaussian  distribution, 
the  fusion  formula  is  relatively  straightforward  [3]. 
Specifically,  (3),  (4),  and  (7)  can  be  rewritten  in 
terms  of  the  first  two  moments  of  the  probability 
distributions  as  the  following: 
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7r(Ar).cov  A(X).cov 


mu  =  cov 


7r(Z).mu  A(Ar).mu 


7r(X).cov  A(X).cov 


(15) 


AGO  % 


COV 


-  £ 


XY  (X).cov 
/  =  !  UV  7 


mu  =  cov 


"  Ay.OO.mu 
AF  (X).cov 

j= 1  yo 


(16) 


7iv.  00  \ 


COV  = 


£ 


7r(Ar).cov  /-^Ay(Z).cov 
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mu  =  cov 


7r(Z).mu  \  —  AjjOO.mu 
7r(X).cov  +  ^  Ayj  (Ar).cov 


(17) 

where  mu,  cov  stand  for  corresponding  mean  and 
covariance,  respectively. 

Equation  (5)  computes  the  7 r  value  for  node 
X.  Analytically,  this  is  equivalent  to  treating  X  as 
a  functional  transformation  of  T  and  the  function 
is  the  one  defined  in  CPD  of  X  denoted  as  h(X). 
Technically,  we  take  T  as  a  multivariate  random 
variable  with  a  mean  vector  and  a  covariance  matrix; 
then  by  using  unscented  transformation,  we  obtain 
an  estimate  of  mean  and  variance  of  X  to  serve 
as  the  7r  value  for  node  X.  In  (5),  7rx(7 7)  is  the  n 
messages  sending  to  X  from  its  parent  1],  which 
is  also  represented  by  mean  and  covariance.  By 
combining  all  the  incoming  7rx(77)  messages,  we  can 
estimate  the  mean  vector  and  covariance  matrix  of 
T.  Obviously,  the  simplest  way  is  to  view  all  parents 
as  independent  variables;  then  combine  their  means 
into  a  mean  vector,  and  place  their  variances  at  the 
diagonal  positions  to  form  a  diagonal  covariance 
matrix.2  With  that,  we  can  compute  the  7r  value  of 
node  X  by 


(7r(X).mu,7r(Z).cov)  =  UT  (T^ix)  .  (18) 


Similarly  but  a  bit  more  complicated,  (6)  computes 
the  A  message  sending  to  its  parent  (27)  from  node 
X.  Note  here  that  we  integrate  out  X  and  all  of  its 
parents  except  the  one  (77)  we  are  sending  A  message 
to.  Theoretically,  this  is  equivalent  to  regarding  77 
as  the  functional  transformation  of  X  and  T\77.  It 


2  This  is  actually  how  the  original  loopy  algorithm  works  and  why 
it  is  not  exact.  To  improve  the  algorithm,  we  can  estimate  the 
correlations  between  all  parents  and  include  them  in  the  covariance 
matrix  of  T. 
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is  necessary  to  mention  that  the  function  used  for 
transformation  is  the  inverse  function  of  the  original 
one  specified  in  P{X  |  T)  with  '/■  as  the  independent 
variable.  We  denote  this  inverse  function  as  v(X,T\2j). 
Note  that  in  practical  problems,  the  original  function 
may  not  be  invertible,  or  its  inverse  function  may  not 
be  unique.  In  such  a  case,  we  need  additional  steps 
to  apply  the  method.  In  this  paper,  we  assume  the 
inverse  function  is  unique  and  always  available.  To 
compute  the  message,  we  first  augment  X  with  T\7J  to 
obtain  a  new  multivariate  random  variable  called  TX; 
then  the  mean  vector  and  covariance  matrix  of  TX  are 
estimated  by  combining  MX )  and  ux(Tk)(k  f  i).  After 
applying  unscented  transformation  to  TX  with  the 
new  inverse  function  v(X,T\7J),  we  obtain  an  estimate 
of  the  mean  and  variance  for  Tt  serving  as  the  Ax(7 J) 
message  as  below: 

(^x(TJ)-mu,  Ax(7J).cov)  =  UT  .  (19) 

With  (15) — (19),  we  can  now  compute  all  messages 
for  continuous  variables.  As  one  may  notice, 
unscented  transformation  plays  a  key  role  here.  This 
is  why  we  call  it  UMP  for  continuous  BNs. 

So  far,  we  have  summarized  message 
representation  and  propagation  for  discrete  and 
continuous  variables,  respectively.  However,  for  the 
hybrid  model,  we  have  to  deal  with  the  messages 
passing  between  both  types  of  variables.  Since  they 
are  in  different  formats,  messages  cannot  be  integrated 
directly.  As  mentioned  in  Section  I,  our  approach  is 
to  partition  the  original  network  before  propagating 
messages  between  them. 

III.  NETWORK  PARTITION  AND  MESSAGE 

INTEGRATION  FOR  HYBRID  MODEL 

First  of  all,  as  mentioned  earlier,  we  assume  that 
a  discrete  node  can  only  have  discrete  parents  in  the 
hybrid  models,  which  implies  continuous  variable 
cannot  have  any  discrete  child  node. 

Definition  1  In  a  hybrid  BN,  a  discrete  variable  is 
called  a  discrete  parent  if  and  only  if  it  has  at  least 
one  continuous  child  node. 

It  is  well  known  that  BN  has  an  important  property 
that  every  node  is  independent  of  its  nondescendant 
nodes  given  its  parents.  Therefore  the  following 
theorem  follows. 

THEOREM  1  All  discrete  parents  in  the  hybrid  BN 
model  can  partition  the  network  into  independent 
network  segments,  each  having  either  purely  discrete 
or  purely  continuous  variables.  We  call  the  set  of  all 
discrete  parents  in  the  hybrid  network  the  interface 
nodes.  In  other  words,  the  interface  nodes  “d- separate” 
the  network  into  different  network  segments. 


Fig.  3.  Demonstration  of  interface  nodes  and  network  partition. 

It  is  obvious  that  the  variables  in  different 
segments  of  the  network  are  independent  of  each 
other  given  the  interface  nodes.  An  example  is 
shown  in  Fig.  3  where  a  13-node  hybrid  model  is 
presented.  Following  the  convention,  we  use  a  square 
or  rectangle  to  depict  the  discrete  variable  and  a  circle 
or  ellipse  to  depict  the  continuous  variable.  As  can 
be  seen,  K,  A,  and  C  are  the  interface  nodes  in  this 
example.  By  representing  the  arcs  between  discrete 
parents  and  their  continuous  children  as  dot  lines,  four 
independent  network  segments  are  formulated — two 
discrete  parts  (H,  B,  F,  K,  G  and  J,  A,  C )  and  two 
continuous  parts  ( T ,  R,  S  and  X,  Y). 

After  partitioning  the  network  with  the  interface 
nodes,  we  choose  the  most  appropriate  inference 
algorithm  for  each  network  segment.  In  fact,  we  can 
also  combine  some  segments  together  if  the  same 
algorithm  works  for  all  of  them.  The  purpose  of 
introducing  the  interface  nodes  is  to  facilitate  the 
network  partition  so  that  at  least  one  algorithm  could 
be  applicable  to  each  segment.  In  general,  separate 
message  passing  in  either  discrete  or  continuous 
network  segment  is  always  doable.  Typically,  the 
continuous  network  segment  with  nonlinear  and/or 
non-Gaussian  CPDs  is  the  most  difficult  one  to  deal 
with.  In  such  case,  we  apply  UMP  presented  in 
Section  IIB  for  approximate  solutions. 

Finally,  we  need  to  summarize  the  prior  and 
evidence  information  for  each  network  segment  and 
encode  it  as  messages  to  be  passed  between  network 
segments  through  the  interface  nodes.  This  is  similar 
to  general  message  passing  but  requires  message 
integrations  between  different  network  segments. 

A.  Message  Integration  for  Hybrid  Model 

For  a  hybrid  model,  without  loss  of  generality, 
let  us  assume  that  the  network  is  partitioned  into 
two  parts  denoted  as  V  and  C.  Part  V  is  a  discrete 
network  and  it  is  solvable  by  appropriate  algorithms 
such  as  junction  tree  or  discrete  loopy  propagation. 
Part  C  is  an  arbitrary  continuous  network.  Let  us 
denote  the  observable  evidence  in  part  V  as  Ed, 
and  the  evidence  from  C  as  Ec.  Therefore  the  entire 
evidence  set  E  consists  of  Ed  and  Ec.  As  mentioned 
before,  given  interface  nodes,  variables  from  the  two 
network  segments  are  conditional  independent  of  each 
other.  The  evidence  from  part  V  affects  the  posterior 
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probability  of  hidden  nodes  in  part  C  and  vice  versa 
only  through  the  channel  of  the  interface  nodes. 

We  therefore  summarize  the  prior  and  evidence 
information  of  each  network  segment  and  encode 
them  as  either  n  or  A  value  at  the  interface  nodes. 
Assuming  that  the  set  of  interface  nodes  between  two 
network  segments  is  I,  then  the  two  messages  are: 

A(I)  =  P(EC  1 1)  and  7r(I)  =  P(l  \  Ed).  These  values  are 
to  be  passed  between  network  segments  to  facilitate 
information  integration.  As  in  Pearl’s  algorithm,  this 
approach  can  be  easily  integrated  with  the  UMP-BN 
loopy  algorithm  mentioned  above  in  a  unified  manner. 

We  use  the  following  concrete  example  to  illustrate 
how  to  integrate  messages  from  different  network 
segments.  As  can  be  seen  in  Fig.  4,  synthetic  hybrid 
model- 1  has  K  as  the  interface  node  dividing  the 
network  into  a  discrete  part  consisting  of  H,  B,  F, 

K,  G  and  a  continuous  part  consisting  of  T,  R,  S,  M, 
Y.  For  the  purpose  of  illustration,  let  us  assume  all 
discrete  nodes  are  binary  and  all  continuous  nodes  are 
scalar  Gaussian  variables. 

Suppose  the  leaf  nodes  G,  M,  Y  are  observable 
evidence.  We  first  focus  on  the  continuous  segment. 

In  this  step,  we  compute  the  A  message  sending  to 
the  interface  node  K  from  continuous  evidence.  And 
conditioning  on  each  possible  state  of  K,  we  estimate 
the  posterior  distributions  for  all  hidden  continuous 
variables  given  continuous  evidence.  Under  Gaussian 
assumption,  these  posterior  distributions  are 
represented  by  means  and  variances  and  they  are 
intermediate  results  that  will  be  combined  after  we 
obtain  the  a  posterior  probability  distribution  of  the 
interface  node  K  given  all  evidence.  Probabilities 
of  all  possible  states  of  K  are  served  as  the  mixing 
weights,  similar  to  computing  the  mean  and  variance 
of  a  Gaussian  mixture. 

Given  K ,  it  is  straightforward  to  compute  the 
likelihood  of  continuous  evidence  M  =  m,  Y  =  y 
because  we  can  easily  estimate  the  conditional 
probability  distribution  of  evidence  node  given 
interface  nodes  and  other  observations.  For  example, 
let 

P(M  =  m,Y  =  y  \  K  =  1)  =  a 


P(M  =  m,Y  =y  \K  =  2)  =  b. 

Then  to  incorporate  the  evidence  likelihood  is 
equivalent  to  adding  a  binary  discrete  dummy  node  as 
the  child  of  the  interface  node  K  with  the  conditional 


probability  table  shown  as  the  following: 


K 

Dummy 

1  2 

1 

aa  1  -aa 

2 

ab  1  -ab 

where  a  is  a  normalizing  constant. 


By  setting  “Dummy”  to  be  observed  as  state  1,  the 
entire  continuous  segment  could  be  replaced  by  the 
node  Dummy.  Then  the  original  hybrid  BN  can  be 
transformed  into  a  purely  discrete  model  shown  in 
Fig.  5  in  which  Dummy  integrates  all  of  the 
continuous  evidence  information. 

The  second  step  is  to  compute  the  posterior 
distributions  for  all  hidden  discrete  nodes  given 
G  =  g.  Dummy  =  1 .  We  have  several  algorithms  to 
choose  for  inference  depending  on  the  complexity  of 
the  transformed  model.  In  general,  we  can  always 
apply  discrete  loopy  propagation  algorithm  to 
obtain  approximate  results  regardless  of  network 
topology.  Note  that  the  posterior  distributions  of  the 
discrete  nodes  have  taken  into  account  all  evidence 
including  the  ones  from  continuous  segment  via  the 
Dummy  node.  However,  we  need  to  send  the  updated 
information  back  to  the  continuous  subnetwork  via 
the  set  of  interface  nodes.  This  is  done  by  computing 
the  joint  posterior  probability  distribution  of  the 
interface  nodes  denoted  as  F(I  |  E).  Essentially,  it  is 
the  7T  messages  to  be  sent  to  the  continuous  network 
segment. 

With  the  messages  encoded  in  the  interface  nodes, 
the  last  step  is  to  go  back  to  the  continuous  segment 
to  compute  the  a  posterior  probability  distributions  for 
all  hidden  continuous  variables.  Recall  that  in  the  first 
step,  for  any  hidden  continuous  variable  X,  we  already 
have  P(X  |  I ,EC)  computed  and  saved.  The  following 
derivation  shows  how  to  compute  P(X  |  E): 

P(X  |  E)  =  P(X  |  Ec,Ed) 

=  Y,P(X,I\Ec,Ed) 

i 

=  |  l,Ec,Ed)P( I 1  Ec,Ed) 

i 

=  5>(X  |  I,£C)R(I  |  E).  (20) 

i 
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The  fourth  equality  is  due  to  the  fact  that  the  set  of 
interface  node  d-separate  the  node  X  with  Ed. 

Assuming  given  an  instantiation  of  the  set  of 
interface  nodes  I  =  i,  P(X  j  I  =  i,Ec)  is  a  Gaussian 
distribution  with  mean  x,  and  variance  of.  Then  (20) 
is  equivalent  to  computing  the  probability  density 
function  of  a  Gaussian  mixture  with  P(I  =  i  |  E)  as 
the  weighting  factors.  Denoting  P(I  =  i  |  E)  as  pt, 
the  mean  x  and  the  variance  a],  of  P(X  |  E)  can  be 
computed  as  the  following  [3,  p.  56]: 


i 

(21) 

J2piai  +  12  -X2' 

(22) 

i  i 


Through  the  above  three  steps,  we  successfully 
integrate  messages  from  different  subnetworks  to 
obtain  the  approximate  posterior  marginal  distribution 
for  both  continuous  and  discrete  hidden  variables 
given  all  evidence.  There  are  two  approximations  in 
the  algorithm.  One  is  from  loopy  propagation  method 
itself.  Another  one  is  that  we  approximate  continuous 
variable  as  Gaussian  distributed  as  we  only  use  the 
first  two  moments  to  represent  continuous  messages. 
However,  it  provides  promising  performance  as  seen 
in  the  numerical  experiment  results. 

IV.  HYBRID  MESSAGE  PASSING  ALGORITHM 

We  have  presented  separate  message  passing  in 
either  discrete  or  continuous  network  segment  and 
message  integration  in  hybrid  model  via  interface 
nodes.  In  this  section,  we  summarize  the  general 
algorithm  of  message  passing  for  hybrid  BNs  as 
shown  in  Table  I. 

In  order  to  incorporate  evidence  information,  we 
allow  a  node  to  send  a  A  message  to  itself.  For  a 
discrete  network,  we  initialize  the  messages  by  letting 
all  evidence  nodes  send  to  themselves  a  vector  of  a 
“1”  for  observed  state  and  Os  for  other  states.  All 


TABLE  I 

Hybrid  Message  Passing  Algorithm  for  General  Mixed  BN 

Algorithm:  Hybrid  Message  Passing  for  General  Mixed  BN 

(HMP-BN). 

Input:  General  hybrid  BN  given  a  set  of  evidence. 

Output:  Posterior  marginal  distributions  of  all  hidden  nodes. 

1.  Determine  the  interface  nodes  and  partition  the  network 
into  independent  segments  with  interface  nodes.  Choose  the 
appropriate  inference  algorithm  for  each  network  segment. 

2.  Continuous  network  segment:  compute  the  A  message 
sending  to  the  interface  nodes  and  the  intermediate 
posterior  distribution  of  the  hidden  continuous  variables 
given  the  interface  nodes  and  the  local  evidence. 

3.  Transform  the  original  network  into  an  equivalent  discrete 
model  with  a  dummy  node  added  as  a  child  of  the 
interface  nodes.  This  dummy  discrete  node  carries  the  A 
message  from  continuous  evidence  to  the  interface  nodes. 

4.  Compute  the  posterior  distribution  for  every  hidden  discrete 
variable  using  the  transformed  discrete  model.  The  joint 
posterior  probability  table  of  the  interface  nodes  is  saved  as 
the  7r  message  to  be  sent  back  to  the  continuous  network 
segment. 

5.  Compute  the  posterior  distribution  for  every  hidden 
continuous  variable  given  all  evidence  by  integrating  the  7r 
message  using  (20). 


other  messages  are  initialized  as  vectors  of  Is.  For 
continuous  network,  a  message  is  represented  by 
mean  and  variance.  We  initialize  the  messages  for 
all  continuous  evidence  nodes,  sending  themselves 
as  the  one  with  the  mean  equal  to  the  observed  value 
and  the  variance  equal  to  zero.  All  other  messages 
in  continuous  network  are  initialized  as  uniform, 
specifically,  zero-mean  and  infinity  variance  (the 
so-called  “diffusion  prior”).  Then  in  each  iteration, 
every  node  computes  its  own  belief  and  outgoing 
messages  based  on  the  incoming  messages  from  its 
neighbors.  We  assess  the  convergence  by  checking  if 
any  belief  change  is  less  than  a  prespecified  threshold 
(for  example,  10  4).  We  use  parallel  updating  for  each 
node  until  the  messages  are  converged. 

V.  NUMERICAL  EVALUATION 
A.  Experiment  Method 

We  use  two  synthetic  hybrid  models  for 
experiments.  One  is  shown  in  Fig.  4  as  mentioned  in 
Section  IIIA  called  GHM-1.  GHM-1  has  one  loop  in 
each  network  segment,  respectively,  (partitioned  by 
the  interface  node  K).  Another  experiment  model  is 
shown  in  Fig.  6  called  GHM-2.  GHM-2  has  multiple 
loops  in  the  continuous  segment. 

For  GHM-1,  we  assume  that  the  leaf  nodes  G,M,Y 
are  observable  evidence.  We  model  its  continuous 
segment  as  a  linear  Gaussian  network  given  the 
interface  node  K.  Therefore  the  original  network  is  a 
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CLG  so  that  the  exact  inference  algorithm  (junction 
tree)  can  be  used  to  provide  the  true  answer  as  a 
golden  standard  for  performance  comparison.  The 
CPTs  and  CPDs  for  nodes  in  GHM-1  are  randomly 
specified. 

Note  that  our  algorithm  can  handle  general 
arbitrary  hybrid  model,  not  just  CLG.  GHM-2  is 
designed  specifically  to  test  the  algorithm  under  the 
situation  where  nonlinear  CPDs  are  involved  in  the 
model.  The  structure  of  the  continuous  segment  in 
GHM-2  is  borrowed  from  [17]  in  which  the  author 
proposed  junction  tree  algorithm  for  CLG.  The 
discrete  nodes  in  the  GHM-2  are  binary,  and  we 
randomly  specify  the  CPTs  for  them  similar  to  the  one 
in  GHM-1.  But  the  CPDs  for  the  continuous  nodes  are 
deliberately  specified  using  severe  nonlinear  functions 
shown  below  to  test  the  robustness  of  the  algorithm: 

F  ~  A/X— 10,3) 

W-A/Y  100,10) 

B  |  K  =  1  ~  A/"(50,5) 

B|K  =  2-Af(60,5) 

E  ~  Af(W  +  2F,  1) 

C~A/"(e^,3) 

D  ~  A f(VW  x  log(£)  -  6,5) 
Min~A/’(V/W  +  6,3) 

Mout  ~  W(0.5  x  D  x  Min,  5) 

L~Af(— 5  xD,5). 

We  assume  that  the  evidence  set  in  the  GHM-2  is 
{//,C,Mout,L}.  Since  no  exact  algorithm  is  available 
for  such  model,  for  comparison  purposes,  we  use  the 
brute  force  sampling  method,  likelihood  weighting, 
to  obtain  an  approximate  true  solution  with  a  large 
number  of  samples  (20  million  samples). 

In  our  experiments,  we  first  randomly  sample 
the  network  and  clamp  the  evidence  nodes  by  their 
sampled  value.  Then  we  run  HMP-BN  to  compute 
the  posterior  distributions  for  the  hidden  nodes. 

It  is  important  to  mention  that  in  both  discrete 
and  continuous  network  segments,  we  implement 
HMP-BN  using  loopy  algorithms  to  make  it  general, 
although  junction  tree  could  be  used  in  network 
segment  whenever  it  is  applicable.  In  addition,  we 
run  LW  using  as  many  samples  as  it  can  generate 
within  roughly  the  same  amount  of  time  HMP-BN 
consumes.  There  are  10  random  runs  for  GHM-1  and 
5  random  runs  for  GHM-2.  We  compare  the  average 
Kullback-Leibler  (KL)  divergences  of  the  posterior 
distributions  obtained  by  different  algorithms. 

Given  unlikely  evidence,  it  is  well  known  that 
the  sampling  methods  converge  very  slowly  even 
with  a  large  sample  size.  We  use  GHM-1  to  test 
the  robustness  of  our  algorithm  in  this  case  because 


Fig.  7.  Posterior  probability  of  hidden  discrete  variables  in  two 
typical  runs. 

junction  tree  can  provide  the  ground  true  for  GHM-1 
regardless  of  the  evidence  likelihood.  We  generate 
10  random  cases  with  evidence  likelihood  between 
10  5  ~  10  15  and  run  both  HMP-BN  and  LW  to 
compare  the  performances. 

B.  Experiment  Results 

For  model  GHM-1,  there  are  4  hidden  discrete 
nodes  and  3  hidden  continuous  nodes.  Fig.  7 
illustrates  the  posterior  probabilities  of  hidden  discrete 
nodes  computed  by  junction  tree,  HMP-BN,  and  LW 
in  two  typical  runs.  Since  GHM-1  is  a  simple  model 
and  we  did  not  use  unlikely  evidence,  both  HMP-BN 
and  LW  perform  well. 

For  continuous  variables  in  GHM-1,  Fig.  8  shows 
the  performance  comparisons  in  means  and  variances 
of  the  posterior  distributions  for  the  hidden  continuous 
nodes  in  all  of  the  10  runs.  The  normalized  error  is 
defined  as  the  ratio  of  the  absolute  error  over  the 
corresponding  true  value.  From  the  figure,  it  is  evident 
that  HMP-BN  provides  accurate  estimates  of  means, 
while  the  estimated  variances  deviate  from  the  true 
somewhat  but  HMP-BN  is  still  better  than  LW  in  most 
cases. 

We  then  demonstrate  the  robustness  of  HMP-BN 
by  testing  its  performance  given  unlikely  evidence 
shown  in  Fig.  9.  In  this  experiment,  10  random  sets 
of  evidence  are  chosen  with  likelihood  between 
10~5  and  10~15.  As  can  be  seen,  HMP-BN  performs 
significantly  better  than  LW  in  this  case.  The  average 
KL  divergences  are  consistently  small  with  the 
maximum  value  less  than  0.05.  This  is  not  surprising 
because  LW  uses  the  prior  to  generate  samples  so  that 
it  hardly  hits  the  area  close  to  the  observations. 

We  summarize  the  performance  results  with 
GHM-1  in  Table  II.  Note  that  given  unlikely  evidence, 
the  average  KL  divergence  by  HMP-BN  is  more  than 
one  order  of  magnitude  better  than  LW. 

In  GHM-2,  due  to  the  nonlinear  nature  of 
the  model,  no  exact  method  exists  to  provide  the 
benchmark.  We  use  LW  with  20  million  samples 
to  obtain  an  approximation  of  the  true  value.  We 
implemented  five  simulation  runs  with  randomly 
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Fig.  8.  GHM-1  Performance  comparison  for  10  random  runs  (ground  true  is  provided  by  junction  tree),  (a)  Mean  comparison. 

(b)  Variance  comparison. 


1e-015  te-013  le-011  le-009  le-007  1e-005 


evidence  likelihood 

Fig.  9.  GHM-1:  Performance  comparison  given  unlikely 
evidence. 

sampled  evidence.  In  this  experiment,  we  adopt  our 
newly  developed  algorithm  UMP-BN  for  inference  in 
continuous  network  segment  [25].  Fig.  10  shows  the 
performance  comparison  in  means  and  variances  of 
the  posterior  distribution  for  the  hidden  continuous 
variables.  Also,  Table  III  summarizes  the  average 
KL  divergences  in  testing  GHM-2.  From  the  data, 
we  see  that  HMP-BN  combining  with  UMP-BN 
applied  in  the  continuous  subnetwork  produces  very 
good  results.  In  this  nonlinear  model  with  the  normal 
evidence,  the  new  algorithm  performs  much  better 
than  LW  despite  its  advantages  of  being  a  model-free 
algorithm.  However,  since  there  is  only  one  interface 
node  in  these  models,  implementing  HMP-BN  is 
relatively  simple. 

C.  Complexity  of  HMP-BN 

In  general,  when  there  are  multiple  interface 
nodes,  HMP-BN  computes  the  posterior  distributions 
of  hidden  continuous  variables  given  continuous 
evidence,  conditioned  on  every  combination  of 


TABLE  II 

Average  KL-Divergence  Comparison  in  Testing  GHM-I 


Average 

Normal  Evidence 

Unlikely  Evidence 

KL  divergence 

>  10~5 

o 

1 

o 

1 

HMP-BN 

0.0011 

0.0108 

LW 

0.0052 

0.67 

TABLE  III 

Average  KL-Divergence  Comparison  in  Testing  GHM-2 

Average  KL  Divergence 

HMP-BN  0.0056 

LW  0.0639 

instantiations  of  all  interface  nodes.  So  the  complexity 
of  the  algorithm  is  highly  dependent  on  the  size 
of  interface  nodes.  To  assess  the  complexity  of 
HMP-BN,  we  conducted  a  random  experiment  using 
network  structure  borrowed  from  the  ALARM  model 
[4]  as  shown  in  Fig.  1 1  in  which  there  are  37  nodes. 
We  randomly  selected  each  node  to  be  discrete  or 
continuous  with  only  a  requirement  that  continuous 
variable  cannot  have  any  discrete  child  node.  In  this 
experiment,  the  average  number  of  interface  nodes 
was  about  12.  HMP-BN  still  provided  good  estimates 
of  the  posterior  distributions  but  it  took  a  much  longer 
time  than  the  one  with  only  one  interface  node.  If  we 
have  n  interface  nodes  Kl,K1,...,Kn  with  number  of 
states  nl,n2,-..,nk,  respectively,  the  computational 
complexity  of  HMP-BN  is  proportional  to  0{nl  x 
n2  x  «3  •  •  •  x  nk).  This  implies  that  our  algorithm  is 
not  scalable  for  a  large  number  of  interface  nodes. 
However,  our  goal  is  not  to  propose  an  algorithm  for 
all  models  (NP-hard  in  general)  and  we  suspect  that 
it  is  rare  to  have  a  large  number  of  interface  nodes  in 
most  practical  models.  Even  with  the  considerable  size 
of  interface  nodes,  HMP-BN  provides  good  results 
within  a  reasonable  time  while  the  stochastic  sampling 
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Fig.  10.  GHM-2:  Performance  comparison  for  5  random  runs  (the  reference  base  is  provided  by  LW  with  20  millions  samples). 

(a)  Mean  comparison,  (b)  Variance  comparison. 
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Fig.  1 1 .  ALARM:  network  constructed  by  medical  expert  for  monitoring  patients  in  intensive  care. 


methods  can  perform  very  poorly  using  the  same 
amount  of  time.  In  addition,  there  are  several  ways 
to  reduce  the  computational  burden  such  as  assuming 
that  some  interface  nodes  with  small  correlations 
are  independent  of  each  other.  Nevertheless,  this 
is  beyond  the  scope  of  the  paper  and  could  be  an 
interesting  topic  for  future  research. 

VI.  CONCLUSION 

In  this  paper,  we  develop  a  hybrid  propagation 
algorithm  for  general  BNs  with  mixed  discrete  and 


continuous  variables.  In  the  algorithm,  we  first 
partition  the  network  into  discrete  and  continuous 
segments  by  introducing  the  interface  nodes.  We  then 
apply  message  passing  for  each  network  segment 
and  encode  the  updated  information  as  messages  to 
be  exchanged  between  segments  through  the  set  of 
interface  nodes.  Finally  we  integrate  the  separate 
messages  from  different  network  segments  and 
compute  the  a  posterior  distributions  for  all  hidden 
nodes.  The  preliminary  simulation  results  show  that 
the  algorithm  works  well  for  hybrid  BN  models. 
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The  main  contribution  of  this  paper  is  to  provide 
a  general  framework  for  inference  in  hybrid  model. 
Based  on  the  principle  of  decomposition  and 
conditioning,  we  introduce  the  set  of  interface  nodes 
to  partition  the  network.  Therefore  it  is  possible  to 
apply  exact  inference  algorithms  such  as  junction 
tree  to  some  applicable  network  segments  which 
enables  the  integration  of  different  efficient  algorithms 
from  multiple  subnetworks.  For  complicated  network 
segment  such  as  the  one  with  nonlinear  and/or 
non-Gaussian  variables,  we  provide  options  to  use  a 
loopy-type  message  passing  algorithm. 

Although  the  bottleneck  of  our  algorithm  is  the 
size  of  interface  nodes,  we  believe  that  HMP-BN  is  a 
good  alternative  for  nonlinear  and/or  non-Gaussian 
hybrid  models  since  no  efficient  algorithm  exists 
for  this  case  (as  far  as  we  know  from  the  literature), 
especially  given  unlikely  evidence.  We  are  currently 
exploring  another  idea  of  propagating  messages 
directly  between  different  types  of  nodes  without 
network  partition  or  interface  nodes.  However,  it  is 
beyond  the  scope  of  the  current  paper. 

Note  that  the  focus  of  this  paper  is  on  developing 
a  unified  message  passing  algorithm  for  general 
hybrid  networks.  While  the  algorithm  works  well 
to  estimate  the  means  and  variances  for  the  hidden 
continuous  variables,  the  true  posterior  distributions 
may  have  multiple  modes.  In  practice,  it  might  be 
more  important  to  know  where  the  probability  mass 
is  than  just  knowing  mean  and  variance.  One  idea  for 
future  research  is  to  utilize  the  messages  computed  in 
HMP-BN  to  obtain  a  good  importance  function  and 
apply  importance  sampling  to  estimate  the  probability 
distributions.  Another  future  research  direction  is  to 
extend  the  hybrid  algorithm  to  the  general  BN  models 
without  restriction  of  node  ordering,  such  as  to  allow 
continuous  parents  for  discrete  variables.  If  successful, 
it  would  be  a  significant  step  forward. 
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The  theoretical  fundamentals  of  distributed  information  fusion 
have  been  developed  over  the  past  two  decades  and  are  now 
fairly  well  established.  However,  practical  applications  of  these 
theoretical  results  to  dynamic  sensor  networks  have  remained 
a  challenge.  There  has  been  a  great  deal  of  work  in  developing 
distributed  fusion  algorithms  applicable  to  a  network  centric 
architecture.  In  general,  in  a  distributed  system  such  as  ad  hoc 
sensor  networks,  the  communication  architecture  is  not  fixed.  In 
those  cases,  the  distributed  fusion  approaches  based  on  pedigree 
information  may  not  scale  because  of  limited  communication 
bandwidth.  In  this  paper,  we  focus  on  scalable  fusion  algorithms 
and  conduct  analytical  performance  evaluation  to  compare  their 
performance.  The  goal  is  to  understand  the  performance  of  these 
algorithms  under  different  operating  conditions.  Specifically,  we 
evaluate  the  performance  of  channel  filter  fusion,  naive  fusion, 
Chemoff  fusion,  Shannon  fusion,  and  Bhattacharyya  fusion 
algorithms.  We  also  compare  their  performance  to  “optimal” 
centralized  fusion  under  a  specific  communication  pattern.  The 
results  show  that  the  channel  filter  fusion,  representing  a  first 
order  approximation  to  the  information  graph  fusion,  is  the  only 
“consistent”  fusion  algorithm. 
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A  distributed  data  fusion  system  consists  of 
a  network  of  sensors  and  processors  that  may  be 
colocated  with  the  sensors.  Sensors  generate  data 
by  observing  the  environment.  Processors  process 
local  sensor  data  and  fuse  data  from  other  sensors  or 
processors.  The  performance  of  a  distributed  fusion 
system  over  a  network  depends  on  three  factors:  the 
network  architecture,  the  reliability  of  communication 
links  within  the  network,  and  applicable  fusion 
algorithms.  Even  though  the  network  architecture 
may  be  fixed  and  known,  adaptive  communication 
strategies  and  possible  communication  link  failures 
will  result  in  a  dynamically  changing  communication 
structure  among  the  fusion  nodes.  Thus,  a  distributed 
fusion  algorithm  is  not  really  practical  unless  it  can 
handle  a  dynamic  communication  structure. 

There  has  been  a  great  deal  of  work  in  developing 
distributed  fusion  algorithms  applicable  to  a 
network  centric  architecture  [1-5].  However,  most 
of  these  algorithms  have  been  designed  for  fixed 
communication  structures  and  may  not  be  practical  for 
distributed  systems  such  as  ad  hoc  sensor  networks 
where  the  communication  architecture  changes 
dynamically  [6].  In  particular,  the  distributed  fusion 
algorithm  based  on  the  information  graph  approach 
[7]  was  developed  to  optimally  combine  information 
from  multiple  nodes  by  maintaining  information 
pedigree  and  using  it  to  avoid  any  double  counting 
of  information.  However,  when  the  communication 
structure  changes  in  real  time,  this  algorithm  may 
not  scale  because  of  its  requirements  to  carry  long 
pedigree  information  for  decorrelation. 

In  this  paper,  we  focus  on  several  scalable  fusion 
algorithms  and  analytically  compare  their  performance 
through  steady-state  estimate  error  prediction.  To 
demonstrate  our  performance  analysis  approach,  we 
use  a  nominal  three-node  fusion  processing  scenario 
with  cyclic  communications  as  shown  in  Fig.  1. 

We  conduct  extensive  simulations  to  validate  the 
theoretical  predictions.  We  have  chosen  this  network 
structure  because  of  its  complexity  due  to  multiple 
paths  for  information  propagation,  and  the  availability 
of  the  optimal  analytical  solution  that  can  be  derived 
and  used  as  a  performance  baseline. 

Specifically,  we  consider  the  fusion  algorithms 
listed  below  and  compare  their  performance  against 
the  optimal  information  fusion  solution. 

Channel  filter 
Naive  fusion 
Chernoff  fusion 
Shannon  fusion 
Bhattacharyya  fusion 

Our  goal  is  to  investigate  how  these  different 
fusion  algorithms  perform  for  a  specific  scenario 
under  limited  communication  bandwidth.  This  is 
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Time 

Fig.  1.  Three  sensor  cyclic  communication  scenario. 


part  of  a  wider  objective  to  understand  the  system 
trades  involved  in  a  general  decentralized  ad  hoc 
sensor  network.  The  rest  of  this  paper  is  organized 
as  follows.  Section  II  briefly  describes  the  set  of 
scalable  distributed  fusion  algorithms  to  be  considered 
in  this  paper.  Section  III  derives  the  analytical  fusion 
performance  evaluation  in  terms  of  steady-state  mean 
square  error.  Section  IV  summarizes  the  technical 
findings  of  the  study,  and  Section  V  presents  some 
future  research  directions. 


II.  SCALABLE  FUSION  ALGORITHMS 


The  theoretic  fundamentals  of  distributed 
information  fusion  are  well  documented  and  have 
been  studied  in  depth  [7-11],  It  is  noted,  however, 
that  practical  applications  of  these  theoretical  results 
to  nondeterministic  information  flow  have  remained  a 
challenge.  The  main  difficulty  is  the  need  to  identify 
and  remove  common  information  from  the  data  sets 
to  be  fused,  while  minimizing  the  amount  of  data 
exchanged  between  agents. 

The  basic  fusion  process,  as  described  in  [7], 
follows  from  set  theory,  where  the  combination  of  n 
event  probabilities  $(•  |  /,)  given  the  information  7(  can 
be  represented  as 


1 

C 


nc' 


(i) 


where  Si  represents  the  combination  of  i  event 
probabilities  such  that,  .Sj  =  []"= ,  <l>(  ■  |  7(),  S2  = 
n;=ije{!-+i . „}  *($}/,  n/y) . s.  =  $(-|7in/2n-7„). 

The  alternating  multiplication  and  division  of 
joint  probabilities  from  (1)  removes  conditional 
dependencies  from  the  data  sets  in  the  form  of  shared 
information. 

While  the  removal  of  duplicate  information 
is  straightforward  in  the  theoretical  formulation, 
identification  of  duplicate  information  for  distributed 
estimation  systems  can  be  difficult  in  practical 
implementations.  The  difficulty  is  due  to  the  need 
to  recognize  correlated  information  resulting  from 
past  fusion  events  and  know  the  values  of  their  data 
sets.  The  information  graph  (IG)  technique  presented 


in  [7-9]  provides  an  analytical  tool  for  identifying 
duplicate  information  in  distributed  estimation 
systems.  The  approach  is  a  symbolic  representation  of 
the  collection,  propagation,  and  fusing  of  data  among 
a  set  of  fusion  agents.  An  example  of  an  IG  is  shown 
in  Fig.  1,  where  a  simple  cyclical  communications 
pattern  is  demonstrated.  Each  numbered  row  of 
symbols  represents  the  events  of  a  given  agent. 

Within  each  time  step,  each  agent  may  perform  time 
updates  of  estimates,  receive  sensor  data,  perform 
measurement  updates,  transmit  the  local  estimate  to 
other  agents,  and  fuse  estimates  received  from  other 
agents. 

The  difficulty  with  the  IG  approach  is  that  it 
is  communication  pattern  dependent — it  needs  to 
consider  all  relevant  common  priors  and  to  remove 
the  common  information  at  these  nodes  from  the 
current  track  update.  Determining  these  nodal 
connections  over  a  varying  network  can  be  difficult 
and  time-consuming.  For  example,  in  the  simple  three 
sensor  cyclic  communication  network  shown  in  Fig.  1, 
the  resulting  formula  for  the  fusion  between  the  first 
two  sensors  at  time  k  is  [7] 

,  ,  1 7i,tWf2,tWfu-3W 

P(x)  = - — - (2) 

c  Pl,k-2(x)P2,k-l(X) 


where  c  is  the  normalization  constant,  p(x)  is  the 
conditional  probability  at  node  v  I  after  fusion,  and 
pjk(x)  is  the  conditional  probability  at  node  si  and 
time  k  before  fusion.  In  the  case  when  all  probability 
densities  are  Gaussian,  the  fusion  formula  becomes 
(see  Fig.  1) 


P  i  —  p  i  _i_  p  i  _  p-i  _  p  i  _i_  p  i 

rk  ~  rl,k  '  r2,k  rli-2  r2,k-\xrk- 3 


Pk  %  =  Pl,kXU  +  P2,kX2,k  -  Pl,k—2Xl,k—2 
P2Jc-lx2,k- 1  +  Pk—3Xk—3  ■ 


(3) 


In  general,  to  construct  the  “optimal’'1  fusion  formula 
may  require  carrying  long  pedigree  information2  that 


1  The  IG  approach  is  optimal  when  the  underlying  system  is 
deterministic. 

2  Information  includes  communication  and  fusion  events  history  as 
well  as  past  fusion  data. 
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might  not  be  practical  in  an  environment  with  limited 
communication  bandwidth  [12]. 

To  address  the  scalability  issue,  we  have  developed 
each  of  the  fusion  algorithms  described  in  the 
following  sections  for  autonomous  sensors  in 
arbitrary  network  conditions.  All  of  these  approaches 
are  suboptimal  in  general  but  provide  adequate 
performance  when  basic  assumptions  are  met. 


A.  Channel  Filter 


The  channel  filter  approach  [13-16]  is  simpler 
than  IG  fusion  in  that  only  the  first  order  redundant 
information  is  considered.  Each  channel  is  defined 
by  a  pair  of  agents — a  transmitting  agent  and  a 
receiving  agent.  The  transmitting  agent  for  a  particular 
channel  is  responsible  for  removing  redundant 
information;  as  such,  it  needs  only  keep  track  of  the 
previous  transmission  from  itself  to  the  receiving 
node. 

However,  in  a  dynamic  ad  hoc  network,  the 
transmitting  data  may  never  reach  the  receiving  end 
because  of  link  uncertainty.  Therefore,  another  idea  is 
to  have  the  receiving  agent  of  a  particular  channel  be 
responsible  for  removing  the  redundant  information. 

In  this  way,  the  receiving  agent  only  needs  to  keep 
track  of  the  previous  data  transmitted  to  or  received 
from  the  channel  at  the  previous  communication 
time  and  remove  it  when  combining  the  current 
estimates.  There  is  no  need  to  maintain  long  histories 
of  previous  activity.  In  a  sense,  this  can  be  considered 
as  a  first  order  approximation  to  the  optimal  IG 
approach. 

Specifically,  the  channel  filter  fusion  equation  is 
given  as 


,  ,  =  px{x)p2(x)/p{x) 

j\pt(x)p2(x)/p(x)]dx 


(4) 


where  p ,  (x)  and  p2(x)  are  the  two  probability 
density  functions  to  be  fused  (one  local  and  the 
other  received  from  a  particular  channel)  and  p(x) 
is  the  density  function  received  from  the  same 
channel  at  the  previous  communication  time  and  is 
the  common  “prior  information”  to  be  removed  in 
the  fusion  formula.  When  both  p ,  (x)  and  p2(x)  are 
Gaussian  density  with  mean  and  covariance  xx,Px 
and  x2,P2,  respectively,  the  fused  state  estimate  and 
corresponding  covariance  error  can  be  written  as 


P-'  =P-1  +  P2l  —Pl 
Plx  =  Pl^lxl  +  PflX2  —  Plx. 


(5) 


While  simpler,  it  is  obvious  that  dependent 
information  is  more  likely  to  be  lost  in  the  channel 
filter  when  compared  with  the  IG  approach.  On  the 
other  hand,  if  the  time  between  when  that  redundancy 
occurred  and  the  current  processing  time  is  relatively 
long,  the  impact  could  be  minimal. 


B.  Naive  Fusion 


Naive  fusion  is  the  simplest  fusion  approach, 
where  it  is  assumed  that  the  dependency  between  the 
density  functions  is  negligible.  This  fusion  approach  is 
the  simplest  type,  but  it  can  be  unreliable.  The  naive 
fusion  formula  can  be  written  as 


p(x)  =  Pi(x)p2(x) 

j  pi(x)p2(x)dx' 


(6) 


For  the  Gaussian  case,  the  fused  state  estimate  and 
corresponding  error  covariance  are  shown  as 


p  1  =  P  1  +  p~l 

( 1 

P  lx  =  P^lXx  +  PplX2- 

Note  that  the  fused  track  covariance  is  the  inverse  of 
the  sum  of  the  inverses  of  the  local  track  covariance 
matrices.  Thus,  because  of  the  lack  of  common  prior 
information,  the  fused  covariance  could  be  much 
smaller,  which  can  lead  to  overconfidence.  Also  when 
the  common  prior  has  very  large  covariance,  (7)  is 
equivalent  to  (5). 


C.  Chernoff  Fusion 


When  the  dependency  between  two  distributions  is 
unknown,  one  idea  is  to  use  the  Chernoff  information 
[17],  The  fusion  formula  is  based  on  the  following: 


p(x)  = 


p'l’(x)p^(x) 

J  p'l’(x)p2^w  (x)dx 


(8) 


where  w  €  [0  1  ]  is  an  appropriate  parameter  that 
minimizes  a  chosen  criteria.  When  the  criterion  to  be 
minimized  is  the  Chernoff  information  as  defined  in 
the  denominator  of  (8),  we  call  it  Chernoff  fusion. 

It  can  be  shown  that  the  resulting  fused  density 
function  that  minimizes  the  Chernoff  information 
is  the  one  “halfway”  between  the  two  original 
densities  in  terms  of  the  Kullback  Leibler  distance 
[17,  p.  312].  In  the  case  when  both  p , (x)  and 
p2(x)  are  Gaussian,  the  resulting  fused  density  is 
also  Gaussian  with  mean  and  covariance  obtained 
as 


P  1  =  wPf1  +(1  -w)P2~l 
P~lx  =  wPf 1  xx  +  (1  -w)P2[x2. 


(9) 


This  formula  is  identical  to  the  covariance  intersection 
(Cl)  fusion  technique  [14-15].  Therefore,  the  Cl 
technique  can  be  considered  as  a  special  case  of  (8). 

In  theory,  Chernoff  fusion  can  be  used  to  combine  any 
two  arbitrary  density  functions  in  a  log-linear  fashion. 
However,  the  resulting  fused  density  may  not  preserve 
the  same  form  as  the  original  ones.  Also  in  general, 
obtaining  the  proper  weighting  parameter  to  satisfy 
a  certain  criterion  may  involve  extensive  search  or 
computation  [18]. 
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D.  Shannon  Fusion 

A  special  case  of  (8)  is  when  the  parameter  vv 
is  chosen  to  minimize  the  determinant  of  the  fused 
covariance  [18,  19].  In  the  Gaussian  case,  it  is 
equivalent  to  minimizing  the  Shannon  information 
of  the  fused  density.  This  is  because  the  Shannon 
information  defined  as  Is  =  —  Lp(x)\np(x)dx  can  be 
shown  to  be  equal  to  Is  =  ^ln((27r)"|/>|1/2)  +  n/1  when 
p(x)  is  Gaussian  with  covariance  P  [18].  We  call  this 
special  case  the  Shannon  fusion.  Note  that  with  (9), 
the  Shannon  information  is  a  convex  function  of  the 
parameter  vv,  and  therefore  the  maximum  is  located  at 
the  extreme  points  (either  vv  =  0  or  w  =  1).  Moreover, 
in  scalar  case  where  both  P{  and  P2  are  scalar,  the 
minimum  of  Shannon  information  is  also  located  at 
the  extremes  [18]. 

E.  Bhattacharyya  Fusion 

Another  special  case  of  (8)  is  when  the  parameter 
vv  is  set  to  be  0.5.  In  this  case,  the  denominator  of 
(8)  becomes  B  =  f  \/ pt  (x)p2(x)dx ,  which  is  the 
Bhattacharyya  bound.  We  call  the  resulting  fusion 
formula,  p(x)  =  ( 1  / B) y7/;,  (x)/u(x),  the  Bhattacharyya 
fusion.  When  both  p  ]  (x)  and  p2(x)  are  Gaussian,  the 
fusion  equation  can  be  written  as 

p  1  =  kpl-l+p2-1) 

P~lx  =  +  P2lx2)  (10) 

=>  x  -  fPj-1  +  p2~1)~1(p1~1x1  +  Pp'x2). 

Therefore,  in  the  Gaussian  case,  Bhattacharyya 
fusion  is  similar  to  naive  fusion;  the  resulting  fused 
covariance  is  merely  twice  as  big  as  that  of  naive 
fusion.  Note  that  the  fusion  equation  can  be  rewritten 
as 

p  1  =  kpk +p2')  =  (pt  1  +p2-i)-kpk  +pk) 

P~lx  =  kpil*\  +pi% )  (11) 

=  (  P,  1 X  |  +  PplX2)—  +  PplX  2). 

This  formula  replaces  the  common  prior 
information  of  (5)  for  the  channel  filter  by  the  average 
of  the  two  sets  of  information  to  be  fused.  Namely, 

P  l  <—  \(P\  '  +P>-1)  and  P  ^x  < —  ^(Pj^'xj  +  P,_1x2).  In 
other  words,  instead  of  removing  the  common  prior 
information  from  the  previous  communication,  as 
in  the  channel  filter  case,  the  common  information 
of  Bhattacharyya  fusion  is  approximated  by  the 
“average”  of  the  two  locally  available  information 
sets. 

In  the  next  section,  we  derive  the  analytical 
performance  of  channel  filter,  naive  fusion,  and 
Bhattacharyya  fusion  in  terms  of  true  steady-state 
mean  square  error.  We  will  derive  the  results  based  on 
the  specific  cyclic  communication  scenario  as  given  in 
Fig.  1.  We  will  also  conduct  extensive  simulation  to 
evaluate  other  alternative  fusion  algorithms. 


III.  ANALYTICAL  PERFORMANCE  PREDICTION 

As  shown  in  Fig.  1,  where  at  time  k  we  define 
\)  X  =  xk\k’p0  —  Pk\k  and  X  =  Xk-l\k-l’P  —  Pk-\\_k-l  aS 
the  fused  state  estimates  and  the  associated  filter 
covariances  at  time  k  and  k  I ;  2)  xl  =  xik\k\Pi  =  Pik\k 
as  the  local  updated  state  estimates  and  the  associated 
filter  covariances;  and  3)  xt  =  xUk-\\k-^Pi  =  Pi,k-i\k-\  as 
the  local  updated  state  estimates  and  the  associated 
filter  covariances  at  the  previous  time  instance 
k-  1. 

Our  goal  is  to  find  the  steady-state  mean  square 
error  covariance  of  the  fused  estimate,  namely,  fl  = 

limA^oo£[(^|t  ~xk)&k\k  ~xk)'}  =  E\(x  -  x)(x  -  x)']. 

In  the  following,  we  assume  that  the  dynamic  system 
follows  a  scalar  random  walk  model,  namely,  xk+]  = 
xk  +  vk  where  vk  is  a  zero  mean  Gaussian  process 
noise  with  variance  Q.  We  further  assume  that  the 
observation  model  is  similar  for  the  three  sensors 
and  is  linear  Gaussian,  i.e.,  zik  =  xk  +  vv,  k  where  vv,  k 
is  a  zero-mean  Gaussian  measurement  noise  with 
variance  A',  for  sensor  i.  In  the  following,  we  assume 
that  the  sensors  have  the  same  quality,  i.e.,  A',  =  R2  = 
R3  =  R.  Therefore,  in  steady  state,  let  A]  =  P,  then 

Pi  =  Pi,k-l\k-l  =  Pi,k\k  =  P/  =  P  ■ 

A.  Channel  Filter 

With  a  channel  filter,  as  shown  in  (5),  the  fusion 
equations  are  written  as 

=Pf P2-|^.  (!2) 

P0~lX  =  Pl  lX i  +  P2  %  -  P2~k\k-lX2Jc\k- 1  •  ( 13> 

Equation  (13)  can  be  rewritten  as, 

A  =  PqX(x  —  x)  =  P\~l{x,  j  —  x)  +  P2l(x2  —  x) 

~i%k\k-l(X2,k\k-l-x) 

=  P_1(X[  — x)  +  P~\x2  —  x)  —  (P  +  Q)-\x2  — x) 

=>  (x  —  x)  =  PqA 
=  PQP^1(xl  — x)  +  AgP_1(x2  —  x) 

-P0(P  +  Q)-'(x2-x).  (14) 

Therefore, 

n  =  E[(x  -  x)2]  =  P0E(AA')Pq.  (15) 

In  the  scalar  case, 

(*  -  x)2  =  xf  +  %(-X2  -  xj2  +  {pP°Q)  2(x2  -  xf 

2P^{xl  —  x)(x2  —  x)  2  P02(Xj  —  x)(x7  —  x) 

+  P2  P(P  +  Q) 

~Pq(x2  —  x)(x2  —  x) 

P(P  +  Q) 
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Fig.  2.  Steady-state  filter  variances  P  and  PQ  for  various  Q  and  R. 


ip}  .  p2 
^Q  =  _^(5  +  £0  +  _21_(5  +  G) 

2P,2 

-77 +  C2) 


pop  +  er 


(17) 


where 


£[(*2  -*)2]  =  £[(*2,*|*-t  -**)2] 

=  E[(X  2,k_l\k- 
=  B  +  Q 

D  —  Z71/C.  ,,\2 


c"i  ' 

1 

1 

1 

K* 

1 

(18) 

=  E[(x2  -x)2] 

(19) 

:2-x)] 

(20) 

2  -  x)]  and 

:2-x)]. 

(21) 

P  =  (Pq  +  Q)-KSK'  =  (Pq  +  Q) 
(P0  +  0)A> 


(^o  +  0)2 

(Po  +  Q  +  R) 


( P0  +  Q  +  R ) 


where  A"  =  (7),  +  Q)/(Pq  +  0  +  R)  is  the  steady-state 
Kalman  gain  and  S  =  PQ  +  Q  +  R  is  the  steady  state 
innovation  variance.  From  (22)  and  (23),  it  can  be 


easily  shown  that 

P3  +  (2 Q)P2  +  (2 Q2)P  -  (2 Q2)R  =  0.  (24) 

A  closed  form  real  solution  of  the  preceding  cubic 
polynomial  can  be  solved  and  the  resulting  P0  as  a 
function  of  various  Q  and  R  is  shown  in  Fig.  2. 

Note  that  the  filter  variance  is  not  the  same  as  the 
true  mean  square  error.  To  obtain  the  true  mean  square 
error  as  given  in  (17),  we  will  need  to  derive  each  of 
the  three  terms  listed  in  (19) — (21).  It  can  be  shown 
that  (the  details  are  omitted), 

B  =  (l-K)2fl  +  (l-K)2Q+K2R  =  Xfl  +  a  (25) 

E’  =  (1  -  K)2{E[(x\  -x,_,)(x'  -**_!)]  +  Q } 


Note  that  in  (17),  PQ  and  P  are  the  steady-state  “filter” 
variances.  They  can  be  obtained  by  solving  the 
following  two  equations: 

V  =pl-1+p2~1-  p2~J]k. ,  =  2 P  1  -  (P  +  er1 

^P0  =  P(P  +  Q)/(P  +  2Q)  (22) 


=  (1  -K)2[Ep  +  Q] 

E,,  =  ( ^  (3 E’  +  B)  +  )  (£'  +  0) 


(26) 


P 


( 


2 


\P(P  +  0) 
Cy  =  C2  = 


(C^  +  C2  +  2D) 


(1  -K)(%  +  ^E’  +  Q 


(27) 


1+(1-K)pTq 


(23) 


(1  -K)(P  +  Q)  (1-KKP  +  2Q) 

(A  +B)+——— - —  0 


2  (P  +  Q)-KP 
=  r/(E'  +  B)  +  rf 


2  (P  +  Q)-KP 


and 


D  =  77(2  E’)  +  ?/. 


(28) 

(29) 
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Process  Noise  O 

Fig.  3.  Comparison  of  channel  filter  analytical  MSE  with  simulated  MSE  (1000  MC  trials). 


Using  relations  (25)-(29),  (17)  can  be  rewritten  as 

9  P-  P2  9  P2 

n.  =  =-f(P  +  E')  + - £ — 2(B  +  Q) - — ^ - (2C) 

p 2  ( p  +  q )2  ^  P(P  +  Qy  J 

=  (2p0  ,  Po  <  :1\B 

\P2  (P  +  Q)2  P(P  +  Q )  ') 


P2  P(P  +  Q) 


( P  +  Q )2  P(P  +  6r 

=  /  j  B  +  l2  E'  ■+■  /■;  —  (7|  +  /  2  ,/|  )7I  +  /2  f2  +  (3 
=  mlB  +  m2  =  m^AU  +  a)  +  m2 

_  m,a  +  m7 

- T- 

1  —  W|  A 


Fig.  3  compares  the  analytical  mean  square  errors 
(MSE)  based  on  (30)  with  the  average  MSE  based 
on  1000  Monte  Carlo  simulation  trials.  It  is  clear  that 
they  are  in  perfect  agreement.  Fig.  3  also  shows  that 
the  filter  variance  P0  is  very  close  to  the  true  MSE, 
which  indicates  that  the  algorithm  behaves  well  and  is 
reasonably  consistent  [9]. 


B.  Naive  Fusion 

With  the  notations  defined  earlier,  the  naive  fusion 
equations  can  be  written  as 

P0  =  (P-'  +P2-'y  1  =P/2  (31) 


x  =  ToU’i  lx1  +  P2  lx2)  =  (X|  +  x2)/2.  (32) 

From  (31),  (x  —  x)  =  [(A,  -  x)  +  (x2  —  x)]/2;  therefore, 

Q  =  £[(A-x)2] 

=  \ \E\(x]  -  x)2]  +  E\(x2  -  x)2]  +  2E\(x]  -  x)(x2  -  x)]] 

=  I(5  +  F)  (33) 

where,  as  defined  before,  B  =  ^[(xj  —  x)2]  = 

£[(x2  —  x)2]  and  E’  =  E^x^  —  x)(x2  —  x)]. 

From  (25),  B  =  Af l  +  a,  and  from  (26), 

E'  =  £[(x j  -xk)(x2-xk)\ 

=  (1  -K)2{E[{1[  xk_ !  )(x2  xk_ ! )]  +  Q } 

=  \{\-K)\B  +  ?,E'  +AQ) 

=»  £'  =  4-3(1  %r-{B  *  4e)  =  M*  +  4®. 

(34) 

Therefore,  from  (25),  (33),  and  (34),  we  have. 


Cl  =  \{B  +  E')  =  1(1  +ii)B  +  2  vQ 
—  4(1  +  //)( AD  +  ct)  +  2^iQ  =>  D 


(1  +  h)q./2  +  2  fiQ 

1  —  (1  +/r)A/2 


(35) 


Note  that  in  (31),  /g  and  P  are  the  steady-state  “filter” 
variances,  which  are  not  the  same  as  the  true  MSEs. 
They  can  be  obtained  by  solving  the  following  two 
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Fig.  4.  Comparison  of  naive  fusion  analytical  MSE  with  simulated  MSE  (1000  MC  trials). 


equations: 


C.  Bhattacharyya  Fusion 


P0  1  =  Pf1 1  +  P2~l  =  2  P  1  ^PQ=P/ 2  (36) 

( P  +  O)2 

P  =  (Pn  +  Q)  —  KSK'  =  (Pn  +  Q)  -  0  ^  N 

v°  ^  0  ^  (P0  +  0  + /?) 


(P/2  +  Q)R 
(P/2  +  Q  +  RY 


(37) 


As  in  the  naive  fusion  case,  the  Bhattacharyya 
fusion  equations  can  be  written  as 

P0  =  2(P-'  +P2-')-]  =P  (39) 

x  =  +  Pflx  2)  =  (X|  +x2)/2.  (40) 

As  in  (33) 


From  (36)  and  (37),  it  can  be  easily  shown  that 


n  =  £[(x-x)2] 


P2  +  (2Q  +  R)P  -  2Qtf  =  0 

_  ygg  +  i?)2  +  -  (2Q  +  /?) 

2 

(38) 

Fig.  4  compares  the  analytical  MSEs  based  on  (35) 
with  the  average  MSE  based  on  1000  Monte  Carlo 
simulation  trials.  It  is  clear  that  they  are  very  close 
to  each  other  when  the  process  noise  is  not  very 
small.  However,  when  the  process  noise  is  extremely 
small  (<  10  4),  the  simulation  results  are  slightly 
lower  than  the  analytical  prediction.  This  could  be 
due  to  numerical  round  off  error  caused  by  the  small 
magnitude  of  the  noise.  Fig.  4  also  shows  that  the 
steady-state  filter  variances  PQ  are  significantly  smaller 
than  the  true  MSE,  especially  when  the  process  noise 
is  not  very  large.  This  implies  that  naive  fusion 
is  too  optimistic  and  has  poor  filter  consistency 
[11]. 


=  \{E\&  1  -  *)2]  +  E[(x2  -  x)2]  +  2£[(xj  -  x)(x2  -  x)]} 

=  b(B  +  E’)  (41) 

where,  as  defined  before,  B  =  XQ  +  a  and  E’  = 

[(1  -  K)2/ 4  -  3(1  -  K)2](B  +  4 Q)  =  ji(B  +  4 Q). 
Therefore,  as  in  (35) 

=  (l+pW2  +  2pg  (42) 

1  —  (1  +  p)A/2  V  ’ 

Note  that  the  only  difference  between  naive  and 
Bhattacharyya  fusion  is  in  (38),  where  P0  and  P 
are  the  steady-state  “filter”  variances,  which  can  be 
obtained  by  solving  the  following  equation: 


p  =  (/>„  +  c>  -  ksk'  =  <p„ +a>- 


( P  +  Q)R 

(P  +  Q+RY 


(43) 


2028 


IEEE  TRANSACTIONS  ON  AEROSPACE  AND  ELECTRONIC  SYSTEMS  VOL.  46,  NO.  4  OCTOBER  2010 


Process  Noise  Q 

Fig.  5.  Comparison  of  Bhattacharyya  fusion  analytical  and  simulated  MSE  (1000  MC  trials). 


From  (43),  it  can  be  easily  shown  that  while  Chernoff  fusion  and  Bhattacharyya  fusion 

. — _ -  perform  slightly  worse.  Note  that  when  ah  sensors 

p 2  +  pQ  _  Qp  _  q  p  =  vQ~  +  ~  Q  have  the  same  quality,  Chernoff  fusion  converges  to 

2  Bhattacharyya  fusion. 

(44)  We  then  evaluate  the  fusion  algorithms  with 
Fig.  5  compares  the  analytical  MSEs  based  on  (41)  different  sensor  qualities.  Instead  of  homogeneous 

with  the  average  MSE  based  on  1000  Monte  Carlo  quality  as  in  the  previous  case,  the  sensor 

simulation  trials.  Again,  they  are  in  perfect  agreement.  measurement  error  variances  are  set  as  0.5,  1.0, 

However,  as  can  be  seen  in  the  figure,  a  critical  and  2.0  for  the  three  sensors,  respectively.  The 

issue  with  this  approach  is  that  the  steady-state  results  are  shown  in  Fig.  8,  which  compares  the 

filter  variances  are  almost  twice  as  large  as  the  performance  of  channel  filter,  Chernoff  fusion,  and 

true  MSE.  This  indicates  that  the  Bhattacharyya  Bhattacharyya  fusion  versus  optimal  fusion.  From  the 

fusion  algorithm  is  too  pessimistic  and  is  severely  figure,  it  is  clear  that  channel  filter  performs  the  best, 

inconsistent.  Bhattacharyya  fusion  performs  slightly  worse,  while 

Chernoff  fusion  performs  the  worst  among  the  three, 

IV.  SIMULATION  RESULTS  AND  DISCUSSION  particularly  when  the  process  noise  is  large. 

To  simulate  the  stochastic  nature  of  the 

In  addition  to  the  theoretical  analysis  for  channel  communication  link,  we  model  the  reliability  of  each 
filter,  naive  fusion,  and  Bhattacharyya  fusion,  we  link  with  a  probabilistic  measure.  For  example,  a 

conducted  extensive  simulation  for  Chernoff  fusion  link  with  0.5  reliability  means  that  the  information 

and  Shannon  fusion  to  compare  their  performances  will  pass  through  the  channel  only  50%  of  the  time, 

against  optimal  centralized  fusion.  The  results  are  We  then  test  the  three  fusion  algorithms  and  their 

shown  in  Fig.  6.  As  can  be  seen,  in  addition  to  naive  robustness  under  various  link  reliabilities.  Because 

fusion,  Shannon  fusion  also  performs  poorly.  This  is  all  algorithms  under  consideration  are  scalable  and 

because  in  the  scalar  case,  Shannon  fusion  essentially  autonomous,  no  additional  changes  are  necessary  in 
picks  the  density  with  smaller  variance.  Therefore  the  algorithms  for  the  test.  The  results  in  Fig.  9  show 

the  fusion  performance  converges  to  single  sensor  that  the  performances  are  in  general  proportional  to 

performance  when  the  sensor  qualities  are  identical.  the  communication  quality,  which  is  quite  intuitive. 

As  shown  in  Fig.  6,  the  remaining  three  algorithms  The  results  also  show  that  ah  three  algorithms 
have  very  similar  performance.  A  closer  look  (Fig.  7)  are  quite  stable  and  they  perform  according  to 
reveals  that  channel  filter  performs  close  to  optimal  expectation. 
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Fig.  6.  Comparison  of  alternative  fusion  algorithms  with  optimal  fusion  algorithm  (Rf  =  ft,  =  R}  =  1). 


icr2  to'1  io°  io1  io2 


Process  Noise  O 

Fig.  7.  Comparison  of  channel  filter,  Chernoff  fusion,  and  Bhattacharyya  fusion  (Rf  =  ft,  =  S,  =  1). 


It  should  be  noted  that  channel  filter,  while 
requiring  a  one-step  memory  to  retrieve  and  remove 
the  common  prior  information  in  each  channel,  has  a 
rather  simple  implementation.  On  the  other  hand,  the 
Chernoff  fusion  algorithm,  in  addition  to  its  poor  filter 
consistency,  needs  significantly  more  computation 
to  search  for  the  optimal  weighting  factor.  Our 
preliminary  experiments  show  that  channel  filter  is  at 
least  one  order  of  magnitude  faster  than  the  Chernoff 
fusion.  Further  investigation  is  needed  to  compare  the 
trade-offs  between  these  promising  algorithms  in  a 
more  reliable  manner. 


V.  SUMMARY 

In  this  paper,  we  focus  on  the  analysis  and 
comparison  of  several  scalable  algorithms  for 
distributed  fusion  in  a  cyclic  communication  sensor 
network.  Specifically,  we  evaluate  the  performance 
of  channel  filter  fusion,  naive  fusion,  Chernoff 
fusion.  Shannon  fusion,  and  Bhattacharyya  fusion 
algorithms.  We  also  compare  their  performance  to 
“optimal”  centralized  fusion  algorithms  under  a 
specific  communication  pattern. 

The  results  show  that  naive  fusion  and  Shannon 
fusion  perform  poorly  while  several  other  scalable 
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Fig.  8.  Channel  filter,  Chernoff  fusion,  and  Bhatacharyya  fusion  versus  optimal  fusion  with  sensors  of  different  qualities 

(«!  =  0.5,  R2  =  1.0,  R3  =  2.0). 


P  rocess  Noise  Variance 

Fig.  9.  Channel  filter  versus  Bhattacharyya  fusion  with  various  communication  link  qualities  (R j  =  R~,  =  R3  =  1). 

algorithms  including  channel  filter,  Chernoff  fusion  algorithm.  In  particular  the  channel  filter 

fusion,  and  Bhattacharyya  fusion,  require  minimum  fusion,  representing  a  first-order  approximation  to  IG 
communication  and  perform  fairly  well.  Their  fusion,  works  surprisingly  well  and  has  been  shown  to 

performance  is  comparable  with  that  of  the  optimal  be  the  only  “consistent”  fusion  algorithm. 
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One  of  the  future  research  directions  is  to  extend 
and  validate  the  results  to  more  general  network 
scenarios.  In  particular,  to  address  the  real  world 
network-centric  tracking  and  fusion  problems.  It  is 
important  to  consider  heterogeneous  sensors  with 
different  sampling  interval  and  error  characteristics 
under  dynamic  communication  topology  and 
constraints.  It  is  also  useful  to  develop  theoretical 
analysis  for  specific  algorithms  whenever  possible  for 
a  given  network  scenario. 
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